Skip to content

Refactor 2026may#276

Open
syoyo wants to merge 146 commits into
devfrom
refactor-2026may
Open

Refactor 2026may#276
syoyo wants to merge 146 commits into
devfrom
refactor-2026may

Conversation

@syoyo

@syoyo syoyo commented May 24, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

syoyo and others added 30 commits May 22, 2026 05:48
… cycle, split attribute-eval

Compile-time reduction work for core + tydra (measured with clang -ftime-trace +
isolated per-TU timing):

- Remove src/ascii-parser-basetype-typedarray.cc: its entire body was
  `#include "ascii-parser-basetype.cc"` with no section guards, so it recompiled
  the full ~4.6k-line TU a second time. As a static archive the linker only pulls
  one of the two identical objects, making the second compile pure waste
  (~38s CPU per clean build).

- Add opt-in precompiled-header support (TINYUSDZ_USE_PCH, default OFF) with
  src/tinyusdz-pch.hh and a ccache sloppiness wrapper. Measured to NOT help this
  codebase (cost is template instantiation + -O3 codegen, not header parsing), so
  it is disabled by default and kept only as a knob.

- Break the value-types.hh <-> timesamples.hh include cycle: value-types.hh no
  longer includes timesamples.hh; TypeTraits<TimeSamples> moves to the end of
  timesamples.hh (where the type is complete). crate-format.hh and sconv-detail.hh
  gain an explicit timesamples.hh include. Stops value-types.hh consumers that do
  not need timesamples from transitively parsing it.

- Split tydra/attribute-eval-typed-all.cc (~432 explicit instantiations in one TU)
  into a shared attribute-eval-typed-impl.inc (generic template bodies), the
  std::string specializations (attribute-eval-typed-all.cc), and per-type-group
  instantiation TUs (attribute-eval-typed-inst-{scalar,array}.cc). value-type-macros.inc
  now exposes composable SCALAR/ARRAY sub-lists with NO_STRING as their union
  (single source of truth). Worst-case per-TU compile for this group drops ~45s -> ~27s.

All native tests (ctest, 14/14) pass; gcc + clang builds verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng TU

render-data-mesh.cc was the largest TU in the project (5637 lines). Extract the
self-contained tangent/normal/quantization helpers (ComputeTangentsAndBinormals,
ComputeNormals, QuantizeMeshTangents, TryQuantizedNormalDedup, QuantizeMeshNormals,
plus the inline GeometricNormal helper) into render-data-mesh-tangent.cc, with
cross-TU declarations in render-data-mesh-internal.hh.

This moves the mikktspace-heavy tangent codegen off render-data-mesh.cc's critical
path. Isolated per-TU compile: render-data-mesh.cc 20.8s + render-data-mesh-tangent.cc
6.1s (compile in parallel) vs the prior single ~27s+ monolith.

ctest 14/14 pass on gcc; clang build verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… separate TUs

scene-access.cc (4491 lines) compiled all ~38 ListPrims/ListShaders explicit
instantiations (~30 prim types + 9 shader types) inline. Move the template bodies
(ListPrims, ListShaders, and their iterative traversal helpers TraverseIterative /
TraverseShaderIterative) into a shared scene-access-traverse-impl.inc, and emit the
explicit instantiations from two dedicated TUs (scene-access-listprims-inst.cc,
scene-access-listshaders-inst.cc). The extern template declarations already present
in scene-access.hh suppress implicit instantiation in scene-access.cc and consumers.

Isolated per-TU compile: scene-access.cc 20.3s + listprims-inst 7.9s +
listshaders-inst 5.4s — the ~13s of instantiation work now compiles in parallel,
off scene-access.cc's critical path.

ctest 14/14 pass on gcc; clang build verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nto impl.inc + inst TU

ascii-parser-basetype.cc (4608 lines, ~25-30s isolated) carried ~131 explicit
ParseBasicTypeArray/ParseTupleArray instantiations plus the member-template bodies
(SepBy1BasicType, ParseTupleArray, ParseBasicTypeArray, MaybeNonFinite, the
vector ReadBasicType overloads) and the file-local parse helpers.

Move the 5 anon-namespace helpers (now inline) + all 15 member-template
definitions into ascii-parser-basetype-impl.inc, included at the top of namespace
ascii by both the main TU (so its non-template overloads still resolve their calls)
and a new ascii-parser-basetype-inst.cc that emits the explicit instantiations.
The non-template ReadBasicType/ParseMatrix overloads and template<> specializations
stay in ascii-parser-basetype.cc.

Isolated per-TU compile: ascii-parser-basetype.cc 8.3s + ascii-parser-basetype-inst.cc
12.0s (parallel) vs the prior single ~25-30s TU. The heavy array-type instantiation
codegen now compiles off the main TU's critical path.

ctest 14/14 pass on gcc; clang build verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nto sibling TU

render-data-material.cc (4773 lines) contained a ~1200-line anonymous-namespace
MaterialX node-graph constant evaluator (EvaluateMtlxConstant, ResolveInput, RI,
BinOp, RGB/HSV, ResolveNodeGraphTarget, ExtractMtlxNodeGraphInfo, ...). Extract it
into render-data-material-mtlx.cc. The two entry points called from the main TU
(EvaluateMtlxNodeGraphAsConstant, ExtractMtlxNodeGraphInfo) and the value structs
they expose (MtlxConstVal, MtlxNodeGraphInfo) move to render-data-material-internal.hh;
the remaining evaluator helpers stay file-local (anonymous namespace) in the new TU.

Isolated per-TU compile: render-data-material.cc 14.0s + render-data-material-mtlx.cc
6.4s (parallel) vs the prior single ~20s+ TU.

ctest 14/14 pass on gcc; clang build verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ns into sibling TU

prim-reconstruct.cc was the single heaviest TU in the project (2257 lines, codegen-
dominated). Move the Geom* ReconstructPrim<T> specializations (GeomSphere, GeomMesh,
GeomCube, GeomCamera, GeomSubset, GeomPointInstancer, ...) into prim-reconstruct-geom.cc.

The shared anonymous-namespace helpers used by both the moved geom specs and the
remaining specs (ReconstructGPrimProperties, ReconstructMaterialBindingProperties,
ReconstructCollectionProperties) plus the RECONSTRUCT_SIMPLE_GEOM_PRIM_BODY macro move
into prim-reconstruct-geom-detail.inc, included by both TUs (anonymous namespace gives
each TU its own internal-linkage copy). ReconstructXformOpsFromProperties stays external
in prim-reconstruct.cc (it is declared in prim-reconstruct.hh and used by sibling TUs).

Isolated per-TU compile: prim-reconstruct.cc 29.0s + prim-reconstruct-geom.cc 26.4s
(parallel) vs the prior single ~50s+ TU. Further geom subgrouping can reduce this more.

ctest 14/14 pass on gcc; roundtrip (tests/run-usdcat-compare.sh) shows 560 equivalent
with the only diffs being pre-existing apiSchemas/Physics-metadata divergences unrelated
to geometry reconstruction; clang build verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eom2.cc

prim-reconstruct-geom.cc (the geom ReconstructPrim<T> specializations, ~26s isolated)
splits cleanly since it already shares prim-reconstruct-geom-detail.inc. Move the heavy
mesh group (GeomMesh, GeomCamera, GeomSubset, GeomPointInstancer) into
prim-reconstruct-geom2.cc; the simple shape specializations (GeomSphere, Cone, Cube,
Cylinder, Capsule, Plane, Points, ...) stay in prim-reconstruct-geom.cc. Both include
the same prologue + detail.inc.

Isolated per-TU compile: prim-reconstruct-geom.cc 16.1s + prim-reconstruct-geom2.cc
16.4s (parallel) vs the prior single 26.4s TU.

ctest 14/14 pass on gcc; clang build verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ons into sibling TU

Move the 11 light ReconstructPrim<T> specializations (SphereLight, RectLight, DiskLight,
CylinderLight, DistantLight, GeometryLight, PortalLight, DomeLight, DomeLight_1,
LightFilter, PluginLightFilter) plus the RECONSTRUCT_LIGHT_PRIM_BODY macro into a new
prim-reconstruct-lightprim.cc. It includes the shared prim-reconstruct-geom-detail.inc
for the reconstruction helpers; ReconstructXformOpsFromProperties stays external in
prim-reconstruct.cc (header-declared). The PrimSpec-variant light specializations
(generated by RECONSTRUCT_PRIM_PRIMSPEC_IMPL) remain in prim-reconstruct.cc.

Mark the shared geom-detail.inc helpers [[maybe_unused]] so TUs that include the inc but
do not use every helper (e.g. the lights TU) don't trip -Werror=unused-function on GCC.

Also remove the stale, orphaned prim-reconstruct-light.cc (8 outdated light specs, not in
any build file).

Isolated per-TU compile: prim-reconstruct.cc 29.0s -> 16.3s + prim-reconstruct-lightprim.cc
14.6s. The prim-reconstruct family is now 5 parallel TUs (~16s each) where it began as a
single ~74s TU.

ctest 14/14 pass on gcc; clang build verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
to_string(Shader) had the largest stack frame in the project (~48 KB). Its
if/else-if chain extracted each candidate shader type with
shader.value.get_value<T>(), which returns nonstd::optional<T> by value -- a full
copy of the (large) shader struct. Because the chain nests via `else if`, all ~15
optionals stay in scope simultaneously, so the compiler reserved stack for every
copy.

Switch to shader.value.as<T>() (returns const T*, no copy) and dereference at the
print call. print_shader_params already takes const T&.

Stack frame for to_string(Shader): 48200 -> 1016 bytes (-fstack-usage, clang -O3).
Behavior is unchanged (read-only pretty-print; as<T>() resolves the same concrete
shader types as get_value<T>()). ctest 14/14 pass on gcc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…growth)

RemoveInactivePrimsRec recursed over the PrimSpec hierarchy (erase inactive
children, then recurse into each surviving child's children), so a deeply nested
prim tree could overflow the call stack. Rewrite as an iterative DFS using an
explicit heap worklist of children-vector pointers: pop a level, erase its inactive
children, then push the surviving children's child lists. Result is identical
(per-level erase is order-independent) but stack depth is now O(1) regardless of
tree depth. Pointers into an already-erased, no-longer-mutated vector stay valid
because tree ownership is stable for the walk.

Part of eliminating unbounded recursion in core (stack-overflow hardening). Mirrors
the existing iterative DFS pattern in print_prim (pprinter.cc).

ctest 14/14 pass on gcc; roundtrip 560 equivalent (no new diffs vs baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ning)

Rewrite three unbounded recursive tree traversals as iterative pre-order DFS using
explicit heap worklists, so deeply nested input cannot overflow the call stack:

- VariantConverter::TraverseForVariants (tydra) — visit prim's variantSets, recurse
  children.
- CompositionGraph::BuildPrimIndex (composition) — build a PrimIndex per prim,
  recurse children. Order-sensitive (instance-prototype selection is "first match
  wins") so children are pushed in reverse to preserve the exact left-to-right
  pre-order; a child failure still aborts the whole walk as before.
- MaterialXParser::ValidateNode (mtlx) — validate node, recurse children.

All three push children in reverse so visitation order (and thus warning/index
accumulation order) matches the original recursion exactly. Pointers into the
read-only source trees stay valid for the walk. Stack depth is now O(1) regardless
of tree depth.

Follows the RemoveInactivePrimsRec / print_prim iterative-DFS pattern. Most other
core+tydra recursion is already depth-guarded (kMaxPrimNestLevel, kMaxIter, etc.).

ctest 14/14 pass on gcc; roundtrip 560 equivalent (no new diffs vs baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hardening)

These recursive functions build NESTED output (so a clean iterative rewrite would
need a manual partial-result stack); instead bound their recursion depth, matching
the existing kMaxDefaultTraversalLimit idiom (ReconstructPrimFromPrimSpecRec,
ComputeDiffImpl). Deeply nested / malicious input now bails gracefully instead of
overflowing the call stack:

- tydra ValueToJSON / JSONToValue / ValueTypeToJSONSchema: add a defaulted
  `uint32_t depth = 0` parameter (public callers unaffected), guard at entry, pass
  depth+1 at the nested-value recursion sites.
- tydra ThreeJSSceneExporter::ConvertNode: same defaulted-depth guard for the node
  hierarchy.
- print_primspec (pprinter.cc) and print_meta (pprint-meta.cc): guard on the
  existing `indent` argument, which strictly increases per nesting level — for
  print_primspec this bounds the print_primspec <-> print_variantSetSpecStmt
  mutual recursion on deep variant trees.

ctest 14/14 pass on gcc; roundtrip 560 equivalent (no new diffs vs baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a non-template TimeSamples::get_scalar(void*, t, interp) switch-on-_type_id
dispatcher plus a per-type get_scalar_impl<T> that reads POD samples straight from
the flat binary _data buffer (no value::Value reconstruction), with a value::Value
fallback for generic storage (string/token/dict/AssetPath). Both are defined and
instantiated only in timesamples.cc, so the heavy per-type code is not re-emitted
in every includer.

get_scalar_impl<T> mirrors the existing get<T>() semantics exactly: default-time ->
first non-blocked sample; single-sample; Held via upper_bound; Linear via lower_bound
idx0/idx1 with blocked-endpoint fallback. A public eval_scalar<T>() transition hook
(header) does the role-compatible type-acceptance check (mirroring value::Value::as<T>)
and forwards to get_scalar; in Phase 3 it becomes the body of get<T>().

Purely additive — the new path is unreachable until the Phase 1.5 parity test exercises
it, so existing behavior is unchanged. This sets up eliminating TypedTimeSamples<T>
(unblocks the ~40s compile win + fixes the get-impl.inc ODR bug).

Builds clang + gcc; ctest 14/14.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…et<T>

Add a parity guard in timesamples_test that asserts the new binary-direct
TimeSamples::eval_scalar<T>() returns identical results to the existing
TimeSamples::get<T>() (value::Value path), across {default-time, Held, Linear} x
{two-sample, single, blocked-middle, blocked-endpoints, dedup, role-type, non-lerp
int, generic token}. Confirms the binary-direct evaluator (incl. the typed lerp vs
value-level Lerp) matches bit-for-bit before any TypedTimeSamples deletion.

This permanent guard gates Phase 3 (deleting TypedTimeSamples<T>).

ctest 14/14 (gcc).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the always-present TypedTimeSamples<T> member of Animatable<T> with a
heap-allocated value::TimeSamples owned via std::unique_ptr (8 bytes, nullptr for
the common scalar-only case) for registered value types, keeping a typed
TypedTimeSamples<T> only for types value::TimeSamples cannot hold (enums and
non-registered structs such as Extent), selected at compile time via a
has_value_type_traits<T> SFINAE discriminator.

User-defined deep copy + explicit move keep value semantics with the unique_ptr
member. Existing consumers stay source-compatible: get_timesamples() now returns a
TypedTimeSamples<T> by value (rebuilt via from_timesamples for value types; bound
by const-ref at call sites), and set(const TypedTimeSamples<T>&) converts into the
type-erased store. A get_timesamples_ptr() accessor is the migration target for
2d-2h; the typed compat shims are removed in Phase 3.

Completing the migration exposed pre-existing gaps in value::TimeSamples binary
reconstruct coverage: reconstruct_binary_sample() and the Phase-1 get_scalar()
switch were missing Extent, the seven half role types (color3h/color4h/point3h/
normal3h/vector3h/texcoord2h/texcoord3h) and timecode. Added those arms (+ include
core/extent.hh) so value-typed timesamples of those types round-trip correctly
(fixes extent-001.usda).

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent (baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s shim

Use Animatable<T>::get_timesamples_ptr() (the type-erased value::TimeSamples) instead
of the typed get_timesamples() compat shim for value types:

- tydra/scene-access.cc: the two value-type ToTypelessTimeSamples(get_timesamples())
  call sites now copy the internal value::TimeSamples directly (no typed round-trip);
  ToTypelessTimeSamples is deleted (the enum variant stays). The enum
  EnumTimeSamplesToTypelessTimeSamples path is unchanged.
- stage.cc: the two memory estimators (allocated + size-based) now delegate to
  value::TimeSamples::estimate_memory_usage()/estimate_actual_usage() on
  get_timesamples_ptr(), dropping the per-element TypedTimeSamples walk.

Step toward removing the value-type get_timesamples() compat shim (and the
TypedTimeSamples<value type> instantiations it pulls) in Phase 3.

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2026may)

Add the TimeSamples/any/heavy-template refactor design doc, plus an Implementation
Progress section recording what is done on branch refactor-2026may (Phase 1, 1.5, 2a,
2e/2f — all verified: clang+gcc, ctest 14/14, roundtrip 560), the two design decisions
discovered during 2a (enum/Extent storage hybrid; completing value::TimeSamples binary
reconstruct coverage), the established consumer-migration pattern, and the remaining
steps (Phase 2b-2h consumer migration, Phase 3 deletion + ODR fix + compile win,
Phases 4-5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace value-type Animatable::get_timesamples() (typed compat shim) with
get_timesamples_ptr() (the type-erased value::TimeSamples) at the crate-writer /
conversion consumers:

- sconv-detail.hh, sconv-shader.cc (5), sconv-geom.cc (4): the "rebuild value::TimeSamples
  from typed samples" loops collapse to copying *get_timesamples_ptr() directly. The two
  role->underlying crate conversions are preserved by iterating the store and extracting
  via as<T>() (color3f -> float3, texcoord2f -> float2). The Visibility-enum site
  (sconv-geom) stays on the typed path (enums keep TypedTimeSamples).
- prim-types.cc: token timesamples -> string read from the value::TimeSamples store.

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s shim

In render-animation-converter.cc and render-data-anim.cc, convert the value-type
Animatable timesamples consumers (skel translations/rotations/scales/blendShape
weights) from the typed get_timesamples() + FOREACH_TIMESAMPLES macro to explicit
loops over get_timesamples_ptr()->get_samples() extracting each sample via
value::Value::as<std::vector<T>>(). The .size() counters use get_timesamples_ptr().

The FOREACH_TIMESAMPLES macro is intentionally left in place: it is also used for
XformOp timesamples, where the source is already a value::TimeSamples (xformOp
get_timesamples() returns optional<value::TimeSamples>) and sample.value is a
value::Value — those usages are unaffected.

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent (skel-animation paths verified).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed shim

pprint-detail.hh: print value-type timesamples from the type-erased value::TimeSamples
store while keeping byte-identical output:
- print_typed_timesamples stays typed (for the enum branch); add
  print_value_typed_timesamples<T> that extracts each sample via value::Value::as<T>()
  and streams *pv (same as the typed `ss << T`).
- print_str_timesamples now reads the store (extracts std::string).
- token timesamples are dual: print_typed_token_timesamples<T> (enum-as-token, kept
  typed) + print_value_token_timesamples (value::token from the store), selected by
  `if constexpr (animatable_detail::has_value_type_traits<T>)` at the call site.
- print_animatable_timesamples branches value (ptr) vs enum (typed).

layer-to-renderscene.hh: ExtractAnimatable (no callers) retyped to value::TimeSamples*
via get_timesamples_ptr().

Note: an earlier attempt routed value::Value through pprint_value() (which returns the
"VALUE_PPRINT" stub for some types) and mis-handled enum-as-token (null get_timesamples_ptr
for enums) — both caught by the roundtrip (extent + enum/purpose files) and fixed here.

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lic API)

GeomPrimvar::_ts_indices: TypedTimeSamples<std::vector<int32_t>> -> value::TimeSamples.
The two ctors, get_timesampled_indices() and set_timesampled_indices() are retyped to
value::TimeSamples (public-API break, authorized). usdGeom.cc:
- ReconstructGeomPrimvar sets the raw value::TimeSamples store directly (drops the
  from_timesamples typed round-trip; int[] element type is validated lazily on read).
- set_indices() replaces the dangling get_sample_at()-mutate with add_sample()
  (value semantics; add_sample overwrites a sample at the same time).
- resolve_indices_at()'s _ts_indices.get(&buf, t, tinterp) now binds value::TimeSamples::get.
tydra/scene-access.cc: the two index-timesamples sites likewise set the raw store directly.

This was the last value-type TypedTimeSamples consumer in reachable code.

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntiations + 5 inst TUs (fix ODR + compile win)

The ODR bug: TypedTimeSamples<T>::get was defined out-of-line in TWO places with
DIFFERENT bodies — timesamples.cc (full blocked-sample handling) and
timesamples-get-impl.inc (incomplete: "FIXME use first item", no blocked handling) —
emitted as COMDAT by timesamples.cc and the 5 timesamples-inst-*.cc and merged
arbitrarily by the linker.

Fix + reap the win (hybrid-aware: the struct template stays for enum-valued Animatables):
- timesamples-get-impl.inc now holds the single, FULL get body (both overloads, AoS +
  the experimental SoA branch) with proper blocked handling.
- timesamples.hh #includes get-impl.inc after the struct, so get<> is header-inline and
  instantiated on demand for any T (incl. enums); adds value-eval-util.hh for lerp().
- timesamples.cc: deleted its out-of-line get bodies + the ~114 `template struct
  TypedTimeSamples<T>;` and ~117 `template ...::get<T>` explicit instantiations.
- timesamples.hh: deleted the extern-template block (struct + get).
- Deleted the 5 timesamples-inst-{scalar,scalar-role,array,array-role,array-basic}.cc
  and their CMakeLists.txt entries (meson/xmake never listed them).

Post-Phase-2 no value type uses TypedTimeSamples, so nothing instantiates the value-type
specializations anymore; enums instantiate the header-inline methods where used.

Result: timesamples.cc isolated compile ~47s -> 16.6s (clang -O3) + 5 TUs removed; ODR
fixed. Builds clang + gcc; ctest 14/14 (timesamples_test is the ODR tripwire); roundtrip
560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- 4a: delete both dead get_sample_at() methods (value::TimeSamples + TypedTimeSamples).
  The value::TimeSamples one returned const_cast<Sample*>(&*it) — a raw pointer into a
  reallocatable vector (dangling footgun). No callers remained after Phase 2c; the only
  user was unit-timesamples.cc, rewritten to has_sample_at() + get<float>() at the exact
  sample time (same intent, no raw pointer escape).
- 4b: value::TimeSamples::_data_offsets uint32_t -> size_t and BLOCKED_OFFSET UINT32_MAX ->
  SIZE_MAX, removing the 4GB byte-offset ceiling (the data buffer can exceed 4GB). Propagated
  through sort_flat_storage()'s param + temp, the get_data_offsets() return type, the byte_offset
  locals and the memory-usage sizeof. _array_counts stays uint32_t (it holds element counts).
- 4c: DEFINE_ROLE_TYPE_TRAIT now static_asserts sizeof/alignof(role) == sizeof/alignof(underlying)
  for every role type (auto-covers all ~23 + future ones). This makes the shared TimeSamples
  switch arms and the role path of any_value_raw_cast provably memory-safe. All role types pass.
- 4d: move any_value_raw_cast (unchecked force-cast) into value::detail so it cannot be reached
  without intent; the 8 internal callers (Value::as/get role branches) now qualify it. The
  checked any_value_cast stays public (it has external callers in crate-writer / c-tinyusd).

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove 6 dead on-disk files that formed a closed cluster — referenced only by
each other, in no build file (CMake/meson/xmake), and #included by no live code:
  attribute-eval-typed-animatable-impl.inc
  attribute-eval-typed-animatable-inst-{array,scalar}.cc
  attribute-eval-typed-animatable-fallback-impl.inc
  attribute-eval-typed-animatable-fallback-inst-{array,scalar}.cc

These were a separate per-type animatable evaluation path that is obsolete now that
Animatable<T>::get() is thin (Phase 2): the live EvaluateTypedAnimatableAttribute<T>
in attribute-eval-typed-impl.inc already calls it directly.

Measurement (the Phase-5 gate): the live attribute-eval TUs compile (clang -O3) at
attribute-eval 5.8s, -typed-all 4.2s, -typed-inst-array 18.3s, -typed-inst-scalar 14.7s.
The ~33s in the two inst TUs is inherent to the ~108-type x 4-eval-function matrix; the
TUs are already split for parallelism (merging would hurt it) and the bodies are already
thin, so no further merge is pursued — the dead-code removal is the win.

Builds clang + gcc; ctest 14/14.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delete uninstantiated, zero-caller helpers left over from the TypedTimeSamples
elimination (verified no callers in src or tests):
- primvar::PrimVar::set_typed_timesamples (both overloads) — superseded by
  set_timesamples(value::TimeSamples).
- Attribute::set_typed_timesamples (both overloads) — only delegated to the above.
- tydra::utils::ExtractAnimatableData declaration (never defined/called) + the
  now-unused TypedTimeSamples forward declaration in common-utils.hh.

TypedTimeSamples<T> now appears only where legitimately needed: the Animatable<T>
hybrid storage for enum/non-registered types, the usdShade enum get<> specializations,
the pprint enum printers, and the scene-access enum->typeless converter.

Builds clang + gcc; ctest 14/14.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tinyusdz.hh's own public API uses only Stage/Prim/Layer, but it re-exported the four
template-heavy schema headers (usdGeom/usdLux/usdShade/usdSkel), forcing every consumer
of the umbrella header to parse them. Remove those includes; a TU that only includes
tinyusdz.hh now compiles in 2.5s (was 3.4s, ~26% / -0.9s) — preprocessed 74228 -> 72170
lines. (The remaining ~69.5k-line floor is the shared value-types/prim-types type system,
pulled in unavoidably via prim.hh/stage.hh — a separate, larger lever.)

Public-API change: code that includes tinyusdz.hh and uses concrete schema types must now
include the relevant usd{Geom,Lux,Shade,Skel}.hh directly. Fixed the in-tree consumers that
relied on the re-export (found build-driven):
  src: json-to-usd.hh, usd-to-json.hh (covers .cc), tydra/raytracing-scene-converter.hh
       (covers .cc), usdMtlx.cc (+usdLux), usdObj.cc (GPrim)
  tests/examples: save_usda, pprint_benchmark, unit-physics

Builds clang + gcc (incl. tests/examples); ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + 3 inst-TUs

usdGeom.cc was the last unsplit monster TU (~29s isolated, clang -O3, no PCH). Its cost
was the GeomPrimvar per-type instantiation matrix over APPLY_GEOMPRIVAR_TYPE (47 types):
flatten_with_indices<T> (2 overloads = 94) + get_value<T> (4 overloads = 188) = ~282
explicit instantiations.

Following the established prim-reconstruct/scene-access split pattern:
- New src/usdGeom-primvar-impl.inc holds the template bodies (flatten_with_indices,
  get_value) + the anon-namespace index-expansion helpers (ExpandWithIndices[FromPtr],
  CopyBlockElements, can_use_* traits), #included INSIDE namespace tinyusdz.
- usdGeom.cc drops those bodies + the INSTANCIATE_* invocations and #includes the impl.inc
  (GeomMesh::get_normals still uses ExpandWithIndices). All non-template plumbing stays.
- usdGeom.hh splits APPLY_GEOMPRIVAR_TYPE into _SCALAR/_VEC/_ROLE group sub-macros (union =
  the same 47 types); the existing EXTERN_TEMPLATE_GET_VALUE (over the full list) already
  suppresses implicit instantiation in the 48 consumers, so this is transparent to them.
- 3 new sibling TUs usdGeom-primvar-inst-{scalar,vec,role}.cc each #include the impl.inc and
  instantiate their group. Registered in CMakeLists.txt + meson.build + xmake.lua (lockstep).
- pragma guard: -Wunused-function only (-Wunused-template is clang-only; gcc rejects it).

Isolated compile (clang -O3): usdGeom.cc 29.1s -> 8.9s; the matrix is now 3 parallel TUs
(scalar 9.7s / vec 10.0s / role 12.4s). Further partition refinement follows.

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split each APPLY_GEOMPRIVAR_TYPE group macro into _A/_B leaf macros (SCALAR_A/B 7+7,
VEC_A/B 7+7, ROLE_A/B 10+9 = 47; the _SCALAR/_VEC/_ROLE and full APPLY_GEOMPRIVAR_TYPE
remain as unions). Replace the 3 inst-TUs with 6 (usdGeom-primvar-inst-{scalar,vec,role}-{a,b}.cc),
each instantiating one ~8-type leaf group via the shared usdGeom-primvar-impl.inc.
CMake/meson/xmake updated in lockstep.

Isolated compile (clang -O3): the 6 matrix TUs are now 6.7-8.4s (was 9.7/10.0/12.4s in the
3-way). usdGeom.cc itself is 8.7s and is now the family's heaviest — that residual is
NON-template plumbing (ValidateTopology/ValidateSubsets/ComputeExtent), not the instantiation
matrix, so further matrix-splitting can't lower it. ~8s is the -O3 back-end codegen floor
here (the prior split TUs run ~25s at -O3, so usdGeom is now well within the codebase norm).

Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the non-joint physics specializations (MjcActuator, NewtonActuator, MjcTendon,
MjcKeyframe, PhysicsCollisionGroup) + their PrimSpec wrappers into a new sibling
prim-reconstruct-physics2.cc (the joint helpers HasPropertyPrefix/ReconstructJointBaseProperties
stay in physics.cc — physics2 uses neither). Each PrimSpec wrapper stays with its PropertyMap
specialization so the explicit-specialization calls resolve in-TU.

Isolated compile (clang -O3): physics.cc 26.9s -> 23.2s, physics2.cc 10.0s. NOTE this 2-way
split is unbalanced — the 6 joints + Scene dominate (~3s each), so physics.cc is still 23.2s;
finishing it needs a ~4-way split with the helpers hoisted to a shared -detail.inc.

CMake-only (meson/xmake do not list prim-reconstruct-physics.cc — pre-existing).
Builds clang + gcc; ctest 14/14; roundtrip 560 equivalent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier 2-way split left physics.cc at 23.2s (6 joints + Scene). Hoist
the two joint helpers (HasPropertyPrefix, ReconstructJointBaseProperties)
into prim-reconstruct-physics-detail.inc, #included inside namespace prim by
each joint-bearing TU (internal linkage per TU), then distribute the joint
specializations:
  physics.cc  -> Joint, Scene            (13.9s)
  physics3.cc -> Revolute/Prismatic/Spherical joints (11.1s)
  physics4.cc -> Fixed/Distance joints   (9.2s)
  physics2.cc -> actuators/tendon/keyframe/collisiongroup (10.0s, unchanged)
Each spec's PrimSpec wrapper stays in its TU. Also drops the leftover orphan
"// MjcActuator" comment header in physics.cc. CMake-only (meson/xmake do not
list the physics TUs). clang+gcc clean, ctest 14/14, roundtrip 560/2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
syoyo and others added 21 commits May 28, 2026 19:47
…rsal, composition arcs

- Prim is now a proper Python heap type (replaces dict return)
  with name/path/type_name properties and get_property/has_property/
  get_property_names/get_properties methods
- Added 12 missing C API getters: float2, double2/3/4, int64,
  uint32, uint64, double/int64/uint/uint64 array, property names
- Fixed C API type detection bug: array types (e.g. point3f[])
  were reported as scalar FLOAT3 instead of FLOAT_ARRAY,
  causing all array property values to return None
- Stage: traverse() returns list of Prims; get_root_prims() added;
  5 composition arc methods exposed (add_reference/payload/inherit/
  specialize, set_variant_selection)
- Complete type stubs (_core.pyi)
- All 8 next-library C++ tests + C API 10/10 pass
- test_schemas_ext: 30 tests covering all 11 committed schema modules
  (Skel, AR, MaterialX, Media, Physics scene/joint/collision/api)
- CMake target 'tinyusdz_next' with TINYUSDZ_NEXT_BUILD_PYTHON=ON
  builds _next_core.so via find_package(Python3)
- All 11 test executables + C API 10/10 tests pass
- test_usdcat_roundtrip: writes Stage to USDC, reads back with
  next library (passes) + pxrUSD usdcat (passes, 0 prims documented)
- Extended schema tests (test_schemas_ext) now part of CMake build
- All 10 next-library test executables pass
- usdz-writer.{hh,cc}: ZIP archive builder with 64-byte alignment,
  CRC32, local file header, central directory, EOCD
- WriteUSDZToFile / WriteUSDZToMemory: convert Stage to USDC then
  wrap in ZIP as root.usdc
- WriteUSDZFromUSDCToFile / WriteUSDZFromUSDCToMemory: skip Stage
  serialization for already-serialized USDC data
- test_usdz_writer: 6 tests (memory, file, readback with USDZReader,
  Stage load through LoadUSD, USDC extraction + readback,
  roundtrip from USDC)
- All 13 next-library test executables + C API 10/10 + Python pass
- add_test() for all 11 test executables + benchmark
- ctest -R next_* runs the full next test suite
- 100% pass, 0 failures
- Writer now writes primChildren TokenListOp on pseudo-root spec
  listing root prim names, matching pxrUSD format expectation
- File parses successfully in pxrUSD usdcat (Exit: 0)
- Stage metadata (defaultPrim, upAxis, etc.) renders correctly
- Prim definitions still not visible — hierarchy reconstruction
  requires additional format investigation
- InternToken(";-)") now called first in BuildTables, making it
  token 0 matching pxrUSD's CrateFile convention
- Empty string root marker shifted to token 1
- Reader uses elem.empty() string check (index-independent),
  so token index change is transparent
- All 12 next-library tests + C API 10/10 pass
- pxrUSD usdcat parses our files successfully (exit 0)
  but prims still not visible — requires deeper crate format
  investigation
- Reorder prim fieldset: specifier now comes before typeName
  matching pxrUSD's CrateFile order expectation
- Change primChildren from TokenListOp to TokenVector (41)
  matching pxrUSD's on-disk format for this field
- All 12 next-library tests + C API 10/10 pass
- pxrUSD sdfdump validates our files as OK but still
  doesn't show prim specs — needs crateFile.cpp source
  analysis for remaining format mismatch
C API additions:
- tinyusdz_next_prim_get_bool_array: bool[] array accessor
- tinyusdz_next_prim_get_token_array: token[] array accessor
- Type detection for Bool[] and Token[] arrays

Python bindings additions:
- Prim.get_children(): returns list of child Prim objects
- Prim.get_relationship(name): returns list of target path strings
- bool[] and token[] support in _make_value

Bug fix:
- TokenVector (type 41) stores raw count+indices without ListOp header byte
  (was causing usdcat out-of-bounds read)

All 12 ctest tests + C API 10/10 + Python module pass
Python additions:
- Prim.has_relationship(name) -> bool
- Prim.has_time_samples(name) -> bool
- Prim.eval_float(name, time) -> Optional[float]
- Prim.eval_float3(name, time) -> Optional[tuple]
- test_basic.py: 15 Python tests covering all Prim/Stage methods

C API test expanded from 10 to 15 tests covering:
- get_property_names, get_relationship_targets
- get_float_array, get_int32_array
- composition arc functions

All 12 ctest + 15 Python + 15 C API tests pass
Writer changes:
- Add primChildren TokenVector to non-root prims with children
- Uses path prefix matching to find child prims

Reader changes:
- BuildStage now reconstructs prim hierarchy from paths
- Sorts prims by full path (depth-first order)
- Uses depth-based stack management with begin/end_prim
- All prims now have correct parent-child relationships

pxrUSD usdcat now correctly reads and outputs the full prim
hierarchy with all properties. First time this milestone
has been reached.

All 12 ctest tests pass
TimeSamples reader:
- Parse TimeSamples indirection format from VALUE section:
  [i64 fwd1][ValueRep times_rep][i64 fwd2][u64 count][ValueRep samples[N]]
- Follow times_rep to read double array of sample times
- Unpack each sample value via UnpackValue
- Store as a Token with sample count (per-property association pending)

Composition arc reader:
- Extract references, payloads, inherits, specializes, variantSelection
  from prim fieldset fields and store in PrimSpecMeta
- These are no longer added as regular properties on the prim,
  they're properly stored as metadata

All 12 ctest + 15 C API + 15 Python tests pass
New unpackers for previously unsupported CrateTypeIds:
- Vec2i, Vec3i, Vec4i: integer vector types (8/12/16 bytes)
- Half: 16-bit float with inline/offset support + half_to_float conversion
- Vec2h, Vec3h, Vec4h: half-precision vector types → float2/3/4
- Quath: half-precision quaternion → Quatf

All registered in the UnpackValue switch. Closes remaining
type coverage gaps for scalar/vector crate types.

All 12 ctest tests + 15 C API + 15 Python tests pass
…adata

- Add variability (Uniform) field for properties with kFlagUniform
- Write variantSets names as comma-separated string field
- Write inherits, specializes, comment metadata fields
- Write multiple references and payloads (semicolon-separated)
- Write doc, comment, subLayers to pseudo-root layer metadata
- All 12 ctest tests pass
- C API: tinyusdz_next_prim_get_relationship_names enumerates all
  relationship names on a prim
- PrimSpec::relationship_names() added for map key iteration
- UsdPrim::GetRelationshipNames() exposed on Stage API
- Python Prim.get_relationship_names() returns list of names
- C API test + Python test coverage
- All 12 ctest + 15 Python tests pass
Reviewed every doc against current source and fixed drift in place:
- Rewrite pcp.md (was ~95% hallucinated APIs) to the real composition_graph
  engine; rewrite refactor-opportunities.md as a completed-work record
  (TimeSamples/Animatable refactor is done); trim tydra-animation-spec-en.md
  of never-implemented structs/methods.
- Fix wrong/stale facts: api-status summary totals (API 31/42, overall 86/111),
  materialx free-vs-member conversion fns, composition PCP task-order vs strength
  wording, crate-writer test count, plus stale paths/line-numbers, API names,
  status tables (usdLux, testing-cpp disabled tests, mcp catalog, ci version);
  mark lte_spectral_api.md DRAFT.
- Document the USDC dictionary (customData) recursive-offset binary format in
  crate-writer.md; note Blender rotation sockets are radians in materialx.md.

Merge tight clusters (flat folder kept):
- instancing + variant -> composition.md
- openpbr-parameters-reference -> materialx.md
- blender-physics -> usd-physics.md
- usdObj -> api-status.md
- PACKED_ARRAY_OPTIMIZATION + durable memory-usage-tasks -> new memory-and-performance.md

Delete stale/transient/redundant docs (DICTIONARY_FORMAT_INVESTIGATION, branches,
c-py-tasks, blender_shader_nodes_mtlx) and the non-functional mkdocs.yml. Keep
wine_cl.md (cleaned) — it documents the current clang-cl/MSVC + WINE cross-compile
toolchain; now linked from README Platform Notes.

Fix all inbound doc links (README, AGENTS, src/tydra/README, web/js/docs);
regenerate static/pcp.html from the corrected pcp.md.

doc/: 33 -> 23 .md files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review-driven correctness, hardening and optimization pass over the USDC crate
reader/writer (src/next/crate) and the supporting Value / PrimSpec / Python-C-API
layers.

Correctness:
- half_to_float: fix subnormal decode (wrong exponent bias + implicit leading
  mantissa bit not cleared); now exact vs IEEE-754 reference for all 65536 halfs.
- UnpackArray: decode compressed int[]/uint[] (delta+LZ4) via
  DecompressCompressedU32 — large int arrays (faceVertexIndices, ...) no longer
  read back as garbage (writer compresses by default).
- TimeSamples writer: write each sample's full ValueRep verbatim so inlined
  scalars and array-valued samples encode correctly (was storing inline value
  bits as a value-block index); verified with pxr usdcat.
- TimeSamples reader: validate the header and skip cleanly instead of fabricating
  a placeholder token property.
- Value::copy_from/destroy: handle TypeId::Token arrays. token[] values were
  left uninitialized (and leaked) on copy, so token[] properties (e.g.
  xformOpOrder) did not round-trip.
- Composition arcs / list metadata (references/payloads/inherits/specializes/
  subLayers/variantSets) written and read as token arrays — full multi-element
  round-trip rather than first-only / "; "-joined.
- BuildStage: consume variability/timeSamples/comment/variantSets so they no
  longer leak as stray properties; guard builder.current().
- relationship_names(): deterministic (sorted) order.
- Vec2h/Vec3h: decode from the inline payload; reject the inline bit on
  >6-byte vector/int types instead of seeking to a bogus offset.

Hardening (untrusted .usdc must not crash/over-read/over-allocate):
- StreamReader: overflow-safe bounds (count > size_ - pos_) and
  check-before-resize across read/skip/read_fixed_string/array helpers.
- UnpackArray: element-count cap (max_array_elements), overflow-safe sizing,
  empty-array (payload==0) handling.
- ReadTOC start+size int64 overflow; FIELDSETS/SPECS size<8 underflow;
  decompressed-size guards before every memcpy; eliminate alloc-before-check.
- ReadPaths: max_paths cap; fix INT32_MIN negation UB.
- CrateReadOptions: add max_paths and max_array_elements.

Optimization:
- primChildren reconstruction O(P^2) -> O(P) via a parent->children map.
- Release the fieldset cache after BuildStage; reserve token-section buffer;
  bulk-read token/bool arrays.

Python/C-API: null-check PyList_New/PyUnicode_FromString in
get_relationship_names; align its cap with get_property_names (512).

Verified: 12/12 next tests pass, targeted write->read round-trip checks
(compressed int[], token[] properties, multi-ref/subLayers, time-sampled scalar
+ array), pxr usdcat cross-read of time samples, and a malformed-input fuzz pass
(truncation / huge counts / overflowing offsets) with no crashes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Security and memory-safety fixes across the untrusted-input paths
(USDC binary, tydra mesh conversion, wasm image decoders):

- lz4: validate per-chunk size against the remaining input and track
  consumed bytes. A crafted chunkSize larger than the input could make
  LZ4_decompress_safe read past the buffer (OOB read) in the multi-chunk
  path. Also drop dead debug scaffolding / unused includes.
- tydra/render-data-mesh: fix off-by-one in faceVertexIndices validation
  (`> points.size()` -> `>=`); an index equal to the point count passed
  validation and then indexed points[] out of bounds. Add a defensive
  bounds check in the quad-split triangulation path.
- tydra/render-data-mesh: use safe::mul for triangulation stride
  multiplies; guard face-varying source reads; fix a pre-existing OOB
  read in the constant-variability path (a constant attribute holds one
  element, so read offset 0 instead of src_data + f*stride).
- usdc-reader-property: reject out-of-range `elementSize` (was a warning
  only) since it feeds downstream stride computations.
- wasm binding: bound decoded image dimensions and compute pixel counts
  with overflow checking (critical on wasm32 where size_t is 32-bit);
  guard the reinterpret_cast image paths against truncated buffers.
  getAsset()/getAssetByUUID() now return JS-owned copies instead of a
  typed_memory_view that dangles after cache eviction; the zero-copy view
  accessor is retained and documented. Add sanity checks to
  addFromRawPointer().

Verified: ctest 14/14; under ASAN the unit tests, USDC roundtrip corpus
(371), USDC parse corpus (198) and failure cases are clean; LZ4 and mesh
fixes confirmed with targeted ASAN repros; wasm builds under emscripten.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Several tydra bounds checks computed an offset as `elementSize * index`
in 32-bit unsigned arithmetic (both operands uint32_t, so the product
wraps at 2^32), while the matching array access cast to size_t first and
was computed in 64-bit. A wrapped 32-bit product could pass the
`>= size()` guard while the access used the true (huge) offset, reading
out of bounds. With max_skin_elementSize = 262144, the wrap is reachable
once a vertex index reaches 16384 (any skinned mesh with >=16K
points/face-verts), independent of platform.

Fixes:

- Joint index/weight reorder bounds checks in ReorderVertexVaryingAttributes
  (render-data-mesh.cc), the vertex path of BuildVertexIndicesImpl and the
  face-varying path of BuildVertexIndicesFastImpl (render-index-builder.cc):
  compute the source offset via safe::mul and validate the full
  [off, off+elementSize) span. This also closes a secondary read of up to
  elementSize-1 elements past the end when a malformed
  jointIndices/jointWeights size is not a multiple of elementSize.

- Vertex-attribute count validation (ArrayValueToVertexAttribute,
  render-data-mesh.cc): the `value_counts != elementSize * num_*` checks
  for Uniform/Vertex/Varying/FaceVarying were 32-bit products that a
  crafted array sized to the wrapped value could satisfy. Compute the
  expected count with safe::mul (error on overflow) and use uint64_t in
  the diagnostic message.

- Joint-reorder temp-vector allocations sized via safe::mul to avoid
  truncation on 32-bit/wasm32.

Hardening (defense-in-depth, not reachable from parse paths):

- image-util.cc colorspace converters computed width*height*channel_stride
  before resize()/indexed loops without an overflow check. Replaced 21
  sites with safe::mul3 (matching the decoders in image-loader.cc). The
  nested loops index with real width/height, so an overflowing product
  would under-size the buffer and read/write OOB; reachable only via direct
  API misuse on wasm32, but fixed for consistency.

No behavior change for valid inputs (products are within size_t, safe::mul
never fails and the span checks never reject when size == elementSize*N);
also fixes correctness for legitimately large meshes whose product exceeds
2^32.

Verified: native ctest 14/14; ASAN unit-test + 663-file corpus with 0
sanitizer reports; tydra_to_renderscene failure lists byte-identical to
baseline (0 regressions) across models/ and tests/usda/; USDC unit 198,
USDC roundtrip 371/371, USDA roundtrip pass; usdcat-compare 194 equivalent
/ 0 different / 1 XFAIL; clean wasm32 build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nsion

Optimize the hot paths of RenderSceneConverter mesh conversion. Output is
byte-identical (verified: scene dumps over the 663-file models/ + tests/usda
corpora match the pre-change build in both default and --calctangent modes,
once embedded __LINE__ numbers in diagnostics are normalized; the dedup
benchmark reports identical unique-vertex counts).

1. Position-bucketed vertex dedup (render-data-mesh-tangent.cc,
   ComputeTangentsAndBinormals): replace `vector<vector<BucketEntry>>` —
   which made one heap allocation per position plus per-bucket reallocations —
   with per-position intrusive singly-linked lists over a single flat arena
   (head[]/tail[] + a grown entry arena). Traversal stays in insertion order
   via tail-append, so the first match is identical to the old code even with
   the tolerant, non-transitive attribs_match (dedup_eps > 0); vertex_indices[]
   are unchanged.

   Measured (standalone bench_mesh_build, 1M points, nrm+uv0):
     - smooth (shared normals, the common case): 127 -> 82 ms (~-36%),
       bucket scratch 30.5 -> 19.1 MB (~-37%).
     - flat (per-triangle normals): 260 -> 227 ms (~-13%); bucket scratch
       30.5->... 83.8 -> 99.2 MB higher (the extra link field + vector growth
       overshoot), but this arena is block-scoped and freed immediately after
       dedup, before the per-vertex tangent accumulation, so persistent output
       memory is unchanged.

2. UniformToFaceVarying (render-data-mesh.cc): the output was grown with a
   per-face-vertex insert() (repeated reallocation). Pre-size it once to the
   exact byte count (sum(faceVertexCounts) * stride, guarded by safe::mul/add)
   and write via offset memcpy.

3. bench_mesh_build.cc: port the same linked-list dedup so the standalone
   benchmark keeps mirroring the library algorithm.

Verified: native ctest 14/14; byte-identical scene output across the corpora;
ASAN over 468 --calctangent meshes reports 0 issues; clean wasm32 build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These are regenerated by the build: `_next_core.so` is a ~3 MB
platform-specific CPython extension and `_core.pyi` is its generated type
stub. They should not live in source control. Untrack both and add
.gitignore rules, mirroring the existing python/tinyusdz/_core.abi3.so rule.
The local files are kept on disk (git rm --cached only) so the editable
install keeps working.

Note: this only stops tracking them at the branch tip. The blobs still
exist in earlier history (introduced 2026-05-28 in 90add99 / 6fc4f13,
already on the remote); fully removing them requires a history rewrite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@syoyo syoyo force-pushed the refactor-2026may branch from d730ae4 to 3345d6b Compare May 29, 2026 16:29
syoyo and others added 8 commits May 30, 2026 15:04
…test suite

Extend the AOUSD Core validator (src/usd-validation.{hh,cc}) with selectable
rule groups via a new ValidationOptions{core, geom, shade}. The no-arg
ValidateLayerAgainstAOUSDCore overload is preserved (core-only), so existing
behavior is unchanged; geom/shade are opt-in.

New rules:
- core: layer metadata sanity (metersPerUnit/timeCodesPerSecond/framesPerSecond
  > 0, startTimeCode <= endTimeCode, upAxis), prim-name identifier check, and
  xformOpOrder integrity.
- geom: geom.encapsulation.nestedGprim (a Gprim must not nest under a Gprim).
- shade: shader/material encapsulation, material:binding relationship + applied
  MaterialBindingAPI, and UsdPreviewSurface input schema conformance
  (shade.preview.inputType / shade.preview.unknownInput).

Locality-dependent rules are gated on composition arcs / over specifiers so they
do not false-positive on a single uncomposed layer. Also fix CollectAppliedSchemas
to surface every known API schema by name (previously only CollectionAPI), which
MaterialBindingAPI detection relies on.

Interfaces:
- tusdcat: new --validate-all flag (core + geom + shade).
- MCP: new usd_validate tool (validates base64 data / file uri / session layer
  uuid / current session stage; selectable `groups`). Documented in doc/mcp.md.

Single source of truth for UsdPreviewSurface types:
src/usd-preview-surface-schema.json drives both the validator's table
(src/usd-preview-surface-inputs.inc, generated by
scripts/gen-usd-preview-surface-inputs.py) and the test fixtures. The
preview-surface-schema-sync CTest fails if the generated .inc is stale.

Tests:
- tests/usda/validation/: 20 self-checking fixtures (hand-written + generated)
  with `# EXPECT:` markers, a runner (run-validation-suite.py) wired into CTest
  as validation-suite-test, and a schema-driven fixture generator.
- Unit tests (unit-usd-validation.cc) and MCP tests (mcp-test-validate.cc) for
  every new rule, including no-false-positive gating cases.

Also fix a pre-existing native MCP build break: src/tydra/mcp-server.cc used the
DCOUT macro without including common-macros.inc, so it failed to compile with
TINYUSDZ_WITH_MCP_SERVER=ON.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
FormatValidationResult now produces a more readable, more honest report:
- States which rule groups were actually checked ("Checked rule groups: core")
  so a core-only PASS is not mistaken for full coverage. USDValidationResult
  carries the ValidationOptions it was produced with.
- Sorts issues deterministically (errors before warnings, layer-scoped issues
  first, then by location and rule id) and aligns the fixed-width severity.
- Summary line reports a PASSED / PASSED with warnings / FAILED status with
  properly pluralized counts instead of "N error(s), M warning(s)".

The usd_validate MCP tool now also returns `checked_groups`. Unit and MCP tests
cover the new report text and field.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
run-validation-regression.py sweeps `tusdcat --validate-all` over the entire
tests/usda fixture corpus (455 files, excluding fail-case/) and fails only if the
validator crashes or hangs. This complements the curated validation-suite-test
(which checks rule correctness on ~20 hand-picked fixtures) with broad robustness
coverage on real-world constructs. Wired into CTest as validation-regression-test.

Current corpus baseline: 0 crashes, 0 timeouts (427 clean, 25 with issues,
3 load-failures).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rable)

Unreal-exported (and other DCC) USD assets routinely author shader
properties whose Sdf type does not match the canonical type implied by
`info:id` / the UsdPreviewSurface spec, e.g. `token outputs:result` on a
UsdPrimvarReader_float2 (canonical: float2). tinyusdz raised a hard error
and aborted the load -- stricter than both the spec's intent and OpenUSD,
which never validates shader output types and resolves the real type from
the Sdr registry at render time.

Add a `strict_shader_type_check` option (default false). When false, such
a mismatch on a UsdPreviewSurface-family shader property is accepted with
a warning, keeping the canonical schema type for connection/render
semantics (matching OpenUSD). When true, it remains a hard parse error.

- New flag threaded USDLoadOptions -> USDA/USDC reader configs ->
  PrimReconstructOptions, reaching the parser on both load paths.
- Outputs: ParseShaderOutputTerminalAttribute<T> takes a strict flag; in
  permissive mode it records the authored type via set_actual_type_name
  and warns. Covers primvar readers, UsdUVTexture, UsdPreviewSurface,
  UsdTransform2d.
- Inputs: new PARSE_SHADER_INPUT_ATTRIBUTE macro warns and lets the
  trailing ADD_PROPERTY preserve the authored value as a custom property
  (no data loss). Applied to UsdPreviewSurface-family inputs only;
  MaterialX keeps its existing validate_mtlx_* controls.
- prim-reconstruct-impl.inc kept in sync with prim-reconstruct-common.inc.
- Add tests/usda/shader-nonconformant-types.usda repro fixture.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite doc/pcp.md into a canonical PCP (Prim Cache Population) reference
covering the AOUSD Core Spec v1.0.1 Section 10 composition model, OpenUSD's
PCP implementation (pxr/usd/pcp), and tinyusdz's mirroring composition_graph
DAG engine, tied together by a PCP<->AOUSD<->tinyusdz correspondence table.
The existing tinyusdz DAG API reference is preserved as a subsection.

Notable: the AOUSD spec uses "LIVERPS" (Local, Inherits, Variants, Relocates,
References, Payloads, Specializes), which includes Relocates in the mnemonic,
vs the older six-letter "LIVRPS"; both describe the same ordering.

Also tidy doc/composition.md: fix the PCP expansion ("Prim Composition
Protocol" -> "Prim Cache Population") and shrink its overlapping OpenUSD-PCP
section to a pointer to pcp.md, keeping the LIVRPS table, correctness
analysis, instancing, variants, and comparison table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants