UPSTREAM PR #1318: chore: replace rand and srand at the library level by loci-dev · Pull Request #76 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-03-05T04:15:59Z

Note

Source pull request: leejet/stable-diffusion.cpp#1318

These functions have global state, so they could interfere with application behavior.

It would arguably be more correct to use std::default_random_device, but that seemed a bit overkill for this.

loci-review · 2026-03-05T05:12:30Z

Overview

Analysis of stable-diffusion.cpp compared 49,765 functions across two versions, identifying 107 modified functions, 18 new functions, and 0 removed functions. The changes stem from a single commit replacing C-style rand/srand with C++ random number generation for improved thread safety and reproducibility.

Binaries Analyzed:

build.bin.sd-cli: +0.103% power consumption (491,105.58 → 491,612.53 nanojoules)
build.bin.sd-server: -0.107% power consumption (527,129.70 → 526,563.51 nanojoules)

Overall performance impact is negligible, with power consumption changes under 0.2% indicating effective performance neutrality despite individual function variations.

Function Analysis

std::vector::end() (build.bin.sd-cli): Throughput time increased 306.67% (59.77ns → 243.07ns, +183.30ns). Response time increased 223.91% (81.86ns → 265.16ns, +183.30ns). This STL function regression appears compiler-driven, likely from disabled inlining. While called frequently (411 uses), absolute impact remains modest.

std::vector<sd_lora_t>::end() (build.bin.sd-server): Throughput time improved 75.41% (243.07ns → 59.78ns, -183.29ns). Response time improved 69.44% (263.94ns → 80.65ns, -183.29ns). Compiler optimizations improved this LoRA parameter iteration function.

ggml_threadpool_params_default (build.bin.sd-cli): Throughput time improved 58.40% (217.48ns → 90.47ns, -127.01ns). Response time improved 45.46% (279.79ns → 152.59ns, -127.20ns). GGML submodule optimizations reduced threadpool initialization overhead.

ggml_compute_forward_map_custom3 (build.bin.sd-server): Throughput time improved 35.05% (219.25ns → 142.41ns, -76.84ns). Response time improved 32.91% (233.99ns → 156.98ns, -77.01ns). Custom operation handling benefits from more efficient RNG implementation.

apply_binary_op (build.bin.sd-cli): Throughput time improved 6.15% (1286.26ns → 1207.13ns, -79.13ns). Response time improved 4.26% (2362.80ns → 2262.11ns, -100.69ns). This frequently-called tensor addition operation shows modest but meaningful improvement.

Other analyzed functions showed mixed compiler-driven optimizations in STL operations (string construction, regex handling, vector reallocation) with changes ranging from -50% to +113%, but absolute impacts remained under 100ns per call.

Additional Findings

Core ML inference operations (matrix multiplication, convolution, attention) remain unchanged. Performance variations are predominantly compiler artifacts affecting peripheral functions (initialization, CLI parsing, memory management) rather than inference hot paths. The RNG replacement successfully achieves thread safety and reproducibility goals without compromising computational efficiency, as confirmed by near-zero net power consumption changes.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

These functions have global state, so they could interfere with application behavior.

loci-review · 2026-03-17T05:41:00Z

Overview

Analysis of 49,653 functions across two binaries reveals minimal performance impact from replacing legacy rand()/srand() with modern C++ random library. Modified: 109 functions (0.22%), New: 18, Removed: 0, Unchanged: 49,526 (99.74%).

Power Consumption:

build.bin.sd-server: 528,347.68 nJ → 527,829.49 nJ (-0.098%)
build.bin.sd-cli: 491,821.56 nJ → 492,215.29 nJ (+0.08%)

Net impact is negligible (<0.1% variation in both binaries).

Function Analysis

Most Significant Changes:

std::vector::begin() (TensorStorage) - build.bin.sd-server:

Response time: 83.36 ns → 264.17 ns (+180.81 ns, +216.9%)
Throughput time: 62.49 ns → 243.30 ns (+180.81 ns, +289.3%)
Regression from compiler-induced control flow reorganization (7→9 blocks, entry block +706%). No source changes to this STL function. Affects model loader TensorStorage iteration.

std::vector::back() - build.bin.sd-cli:

Response time: 452.50 ns → 262.66 ns (-189.84 ns, -41.9%)
Throughput time: 259.71 ns → 69.86 ns (-189.85 ns, -73.1%)
Improvement from block consolidation (12→10 blocks) and optimized entry execution. Used extensively in denoiser, U-Net, and utility modules.

std::shared_ptr::_M_destroy (FinalLayer) - build.bin.sd-cli:

Response time: 501.48 ns → 313.30 ns (-188.18 ns, -37.5%)
Throughput time: 293.74 ns → 105.03 ns (-188.71 ns, -64.2%)
Improvement from eliminating two intermediate branch blocks (14→13 blocks).

ggml_log_internal - build.bin.sd-server:

Response time: 447.22 ns → 403.26 ns (-43.96 ns, -9.8%)
Throughput time: 174.28 ns → 130.32 ns (-43.96 ns, -25.2%)
Block consolidation (8→7) improves frequently-called GGML logging infrastructure.

alloc_params_ctx - build.bin.sd-server:

Response time: 3,553.37 ns → 3,375.59 ns (-177.78 ns, -5.0%)
Throughput time: 192.67 ns → 192.66 ns (unchanged)
Improvement from downstream optimizations in ggml_init calls (-89 ns each). Critical for model initialization.

Red-Black Tree Operations - build.bin.sd-server:
Three related functions (_M_get_insert_unique_pos variants, _M_insert) show consistent regressions (+27-45% throughput time, +34-68 ns absolute) from compiler-generated entry block overhead and extra indirection. Used by std::map operations.

Other analyzed functions (vector constructor, hashtable deallocation, initialization checks, regex operations) showed minor changes with mixed improvements and regressions, all under 160 ns absolute impact.

Additional Findings

Source Code Context: Single commit replaced rand()/srand() with std::minstd_rand in examples/common/common.hpp. Performance variations stem from recompilation effects on STL template instantiation and code layout, not algorithmic changes.

ML Infrastructure Impact: GGML infrastructure improvements (logging -25%, context allocation -5%) benefit inference monitoring and model initialization. TensorStorage::begin() regression (+289%) affects model loading but is one-time initialization cost, not hot inference path.

Cross-Function Effects: Red-black tree regressions compound (~102 ns per map insertion). Vector operation improvements (back: -73%) offset initialization regressions. High-frequency function improvements (logging) outweigh low-frequency regressions (initialization), resulting in near-zero net power impact.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 5, 2026 04:16 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 3 times, most recently from dd19ab8 to 98460a7 Compare March 10, 2026 04:15

chore: replace rand and srand at the library level

1bfd831

These functions have global state, so they could interfere with application behavior.

loci-dev force-pushed the main branch from 98460a7 to b898db0 Compare March 17, 2026 04:17

loci-dev force-pushed the loci/pr-1318-sd_replace_rand branch from 18d93ce to 1bfd831 Compare March 17, 2026 04:42

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 17, 2026 04:42 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1318: chore: replace rand and srand at the library level#76

UPSTREAM PR #1318: chore: replace rand and srand at the library level#76
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1318-sd_replace_rand

loci-dev commented Mar 5, 2026

Uh oh!

loci-review bot commented Mar 5, 2026

Uh oh!

loci-review bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Mar 5, 2026

Uh oh!

loci-review bot commented Mar 5, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

loci-review bot commented Mar 17, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants