Skip to content

UPSTREAM PR #1356: docs: updated model file info#86

Open
loci-dev wants to merge 3 commits intomainfrom
loci/pr-1356-leejet_Mar18
Open

UPSTREAM PR #1356: docs: updated model file info#86
loci-dev wants to merge 3 commits intomainfrom
loci/pr-1356-leejet_Mar18

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1356

  1. Download links have been provided instead of the conversion process.
  2. The ckpt references have been replaced with those on SafeTensors.

These changes were made because this file is intended for less experienced users. They are not required to use *.ckpt and can download the models instead of creating them themselves.

1. Download links have been provided instead of the conversion process.
2. The ckpt references have been replaced with those on SafeTensors.
These changes were made because this file is intended for less experienced users.
They are not required to use *.ckpt and can download the models instead of creating them themselves:
@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod March 19, 2026 04:23 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Mar 19, 2026

No meaningful performance changes were detected across 49622 analyzed functions in the following binaries: build.bin.sd-server, build.bin.sd-cli.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

akleine added 2 commits March 19, 2026 10:16
To better distinguish between both SDXS versions,
the "old" VERSION_SDXS is now called VERSION_SDXS_512_DS,
where DS stands for the "DreamShaper" edition by IDKiro.
@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod March 21, 2026 04:56 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Mar 21, 2026

Overview

Analysis of 49,631 functions (92 modified, 0 new, 0 removed) shows minimal performance impact from SDXS-09 model support addition. Power consumption improved slightly: build.bin.sd-server -0.153% (528,347.68 → 527,536.81 nJ), build.bin.sd-cli -0.033% (491,821.56 → 491,660.58 nJ).

Function Analysis

Intentional feature additions (justified):

  • sd_version_is_sd2() (4 instances, both binaries): +26.4% response time (+12ns absolute) due to added VERSION_SDXS_09 check. Expected overhead for model classification.
  • UnetModelBlock lambda operator (both binaries): +37% throughput (+28ns), +0.04-0.05% response time (+242-321ns). Adds SDXS-09 attention head remapping (5 heads→1 head, 64→320 dims) for inference optimization.

Standard library regressions (compiler-related):

  • std::_Hashtable::end() (sd-server): +138% response time (+162ns). Code reorganization with extra basic blocks, no source changes.
  • __gnu_cxx::__normal_iterator::operator- (sd-server): +83% response time (+75ns). Unnecessary jump indirection added.
  • std::map::operator[] (sd-server): +42% throughput (+62ns), +1.4% response time. Entry block reorganization overhead.

Standard library improvements:

  • std::vector::back() (sd-cli): -42% response time (-190ns), -73% throughput. Entry block consolidation benefits GPU tensor buffer access.
  • std::_Sp_counted_ptr_inplace::_M_destroy (sd-cli): -38% response time (-189ns), -64% throughput. Redundant loop elimination.
  • ggml_log_internal (sd-server): -10% response time (-45ns), -25% throughput. Block consolidation optimization.

Other analyzed functions showed negligible changes.

Additional Findings

The attention head remapping for SDXS-09 (5→1 heads with proportionally larger dimensions) maintains mathematical equivalence while reducing multi-head attention overhead. This optimization is expected to improve inference performance during denoising iterations, though benefits aren't captured in initialization-phase metrics. STL regressions stem from compiler optimization differences rather than application code changes, with absolute impacts (62-162ns) remaining small relative to inference workloads (milliseconds to seconds).

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants