UPSTREAM PR #1356: docs: updated model file info by loci-dev · Pull Request #86 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-03-19T04:23:48Z

Note

Source pull request: leejet/stable-diffusion.cpp#1356

Download links have been provided instead of the conversion process.
The ckpt references have been replaced with those on SafeTensors.

These changes were made because this file is intended for less experienced users. They are not required to use *.ckpt and can download the models instead of creating them themselves.

1. Download links have been provided instead of the conversion process. 2. The ckpt references have been replaced with those on SafeTensors. These changes were made because this file is intended for less experienced users. They are not required to use *.ckpt and can download the models instead of creating them themselves:

loci-review · 2026-03-19T05:10:25Z

No meaningful performance changes were detected across 49622 analyzed functions in the following binaries: build.bin.sd-server, build.bin.sd-cli.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

To better distinguish between both SDXS versions, the "old" VERSION_SDXS is now called VERSION_SDXS_512_DS, where DS stands for the "DreamShaper" edition by IDKiro.

loci-review · 2026-03-21T05:57:41Z

Overview

Analysis of 49,631 functions (92 modified, 0 new, 0 removed) shows minimal performance impact from SDXS-09 model support addition. Power consumption improved slightly: build.bin.sd-server -0.153% (528,347.68 → 527,536.81 nJ), build.bin.sd-cli -0.033% (491,821.56 → 491,660.58 nJ).

Function Analysis

Intentional feature additions (justified):

sd_version_is_sd2() (4 instances, both binaries): +26.4% response time (+12ns absolute) due to added VERSION_SDXS_09 check. Expected overhead for model classification.
UnetModelBlock lambda operator (both binaries): +37% throughput (+28ns), +0.04-0.05% response time (+242-321ns). Adds SDXS-09 attention head remapping (5 heads→1 head, 64→320 dims) for inference optimization.

Standard library regressions (compiler-related):

std::_Hashtable::end() (sd-server): +138% response time (+162ns). Code reorganization with extra basic blocks, no source changes.
__gnu_cxx::__normal_iterator::operator- (sd-server): +83% response time (+75ns). Unnecessary jump indirection added.
std::map::operator[] (sd-server): +42% throughput (+62ns), +1.4% response time. Entry block reorganization overhead.

Standard library improvements:

std::vector::back() (sd-cli): -42% response time (-190ns), -73% throughput. Entry block consolidation benefits GPU tensor buffer access.
std::_Sp_counted_ptr_inplace::_M_destroy (sd-cli): -38% response time (-189ns), -64% throughput. Redundant loop elimination.
ggml_log_internal (sd-server): -10% response time (-45ns), -25% throughput. Block consolidation optimization.

Other analyzed functions showed negligible changes.

Additional Findings

The attention head remapping for SDXS-09 (5→1 heads with proportionally larger dimensions) maintains mathematical equivalence while reducing multi-head attention overhead. This optimization is expected to improve inference performance during denoising iterations, though benefits aren't captured in initialization-phase metrics. STL regressions stem from compiler optimization differences rather than application code changes, with absolute impacts (62-162ns) remaining small relative to inference workloads (milliseconds to seconds).

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 19, 2026 04:23 — with GitHub Actions Inactive

akleine added 2 commits March 19, 2026 10:16

feat: add support for SDXS-09

9f08bc0

To better distinguish between both SDXS versions, the "old" VERSION_SDXS is now called VERSION_SDXS_512_DS, where DS stands for the "DreamShaper" edition by IDKiro.

docs: update for SDXS-09

aee59cf

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 21, 2026 04:56 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1356: docs: updated model file info#86

UPSTREAM PR #1356: docs: updated model file info#86
loci-dev wants to merge 3 commits intomainfrom
loci/pr-1356-leejet_Mar18

loci-dev commented Mar 19, 2026

Uh oh!

loci-review bot commented Mar 19, 2026

Uh oh!

loci-review bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Mar 19, 2026

Uh oh!

loci-review bot commented Mar 19, 2026

Uh oh!

loci-review bot commented Mar 21, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants