[Bug] Illegal instruction (core dumped)

### Git commit

none
https://github.com/leejet/stable-diffusion.cpp/releases/download/master-535-84cbd88/sd-master-84cbd88-bin-Linux-Ubuntu-24.04-x86_64-vulkan.zip

### Operating System & Version

Debian GNU/Linux forky/sid (forky) x86_64

### GGML backends

Vulkan

### Command-line arguments used

./sd-cli -v -m ./v1-5-pruned-emaonly.safetensors -p "a lovely cat"

### Steps to reproduce

ran `./sd-cli -v -m ./v1-5-pruned-emaonly.safetensors -p "a lovely cat"` in Konsole at the program directory

### What you expected to happen

for an image to be generated

### What actually happened

program returned 

> Illegal instruction        (core dumped) ./sd-cli -v -m ./v1-5-pruned-emaonly.safetensors -p "a lovely cat"

 after loading the model, nothing generated

### Logs / error messages / stack trace

[DEBUG] main.cpp:515  - version: stable-diffusion.cpp version unknown, commit 5792c66
[DEBUG] main.cpp:516  - System Info: 
    SSE3 = 1 |     AVX = 1 |     AVX2 = 1 |     AVX512 = 0 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 1 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 1 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 | 
[DEBUG] main.cpp:517  - SDCliParams {
  mode: img_gen,
  output_path: "output.png",
  verbose: true,
  color: false,
  canny_preprocess: false,
  convert_name: false,
  preview_method: none,
  preview_interval: 1,
  preview_path: "preview.png",
  preview_fps: 16,
  taesd_preview: false,
  preview_noisy: false
}
[DEBUG] main.cpp:518  - SDContextParams {
  n_threads: 4,
  model_path: "./v1-5-pruned-emaonly.safetensors",
  clip_l_path: "",
  clip_g_path: "",
  clip_vision_path: "",
  t5xxl_path: "",
  llm_path: "",
  llm_vision_path: "",
  diffusion_model_path: "",
  high_noise_diffusion_model_path: "",
  vae_path: "",
  taesd_path: "",
  esrgan_path: "",
  control_net_path: "",
  embedding_dir: "",
  embeddings: {
  }
  wtype: NONE,
  tensor_type_rules: "",
  lora_model_dir: ".",
  photo_maker_path: "",
  rng_type: cuda,
  sampler_rng_type: NONE,
  offload_params_to_cpu: false,
  enable_mmap: false,
  control_net_cpu: false,
  clip_on_cpu: false,
  vae_on_cpu: false,
  flash_attn: false,
  diffusion_flash_attn: false,
  diffusion_conv_direct: false,
  vae_conv_direct: false,
  circular: false,
  circular_x: false,
  circular_y: false,
  chroma_use_dit_mask: true,
  qwen_image_zero_cond_t: false,
  chroma_use_t5_mask: false,
  chroma_t5_mask_pad: 1,
  prediction: NONE,
  lora_apply_mode: auto,
  vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
  force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:519  - SDGenerationParams {
  loras: "{
  }",
  high_noise_loras: "{
  }",
  prompt: "a lovely cat",
  negative_prompt: "",
  clip_skip: -1,
  width: -1,
  height: -1,
  batch_count: 1,
  init_image_path: "",
  end_image_path: "",
  mask_image_path: "",
  control_image_path: "",
  ref_image_paths: [],
  control_video_path: "",
  auto_resize_ref_image: true,
  increase_ref_index: false,
  pm_id_images_dir: "",
  pm_id_embed_path: "",
  pm_style_strength: 20,
  skip_layers: [7, 8, 9],
  sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0, flow_shift: inf),
  high_noise_skip_layers: [7, 8, 9],
  high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0, flow_shift: inf),
  custom_sigmas: [],
  cache_mode: "",
  cache_option: "",
  cache: disabled (threshold=1, start=0.15, end=0.95),
  moe_boundary: 0.875,
  video_frames: 1,
  fps: 16,
  vace_strength: 1,
  strength: 0.75,
  control_strength: 0.9,
  seed: 42,
  upscale_repeats: 1,
  upscale_tile_size: 128,
}
[DEBUG] stable-diffusion.cpp:177  - Using Vulkan backend
[DEBUG] ggml_extend.hpp:75   - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:75   - ggml_vulkan: 0 = AMD Radeon RX 580 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
[INFO ] stable-diffusion.cpp:198  - Vulkan: Using device 0
[INFO ] stable-diffusion.cpp:256  - loading model from './v1-5-pruned-emaonly.safetensors'
[INFO ] model.cpp:369  - load ./v1-5-pruned-emaonly.safetensors using safetensors format
[DEBUG] model.cpp:503  - init from './v1-5-pruned-emaonly.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:341  - Version: SD 1.x 
[INFO ] stable-diffusion.cpp:369  - Weight type stat:                      f32: 1131 
[INFO ] stable-diffusion.cpp:370  - Conditioner weight type stat:          f32: 196  
[INFO ] stable-diffusion.cpp:371  - Diffusion model weight type stat:      f32: 686  
[INFO ] stable-diffusion.cpp:372  - VAE weight type stat:                  f32: 248  
[DEBUG] stable-diffusion.cpp:374  - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:161  - vocab size: 49408
[DEBUG] clip.hpp:172  - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1949 - clip params backend buffer size =  469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1949 - unet params backend buffer size =  2155.33 MB(VRAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1949 - vae params backend buffer size =  94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:781  - loading weights
[DEBUG] model.cpp:1350 - using 4 threads for model loading
[DEBUG] model.cpp:1372 - loading tensors from ./v1-5-pruned-emaonly.safetensors
  |==================================================| 1131/1131 - 95.11it/s
[INFO ] model.cpp:1592 - loading tensors completed, taking 11.90s (process: 0.00s, read: 8.09s, memcpy: 0.00s, convert: 1.00s, copy_to_backend: 1.17s)
[DEBUG] stable-diffusion.cpp:816  - finished loaded file
[INFO ] stable-diffusion.cpp:874  - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): text_encoders 469.44MB(VRAM), diffusion_model 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:944  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:3528 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3564 - sampling using Euler A method
[INFO ] denoiser.hpp:494  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3691 - TXT2IMG
Illegal instruction        (core dumped) ./sd-cli -v -m ./v1-5-pruned-emaonly.safetensors -p "a lovely cat"


### Additional context / environment details

OS: Debian GNU/Linux forky/sid (forky) x86_64
Kernel: Linux 6.18.15+deb14-amd64
Shell: bash 5.3.9
DE: KDE Plasma 6.5.4
WM: KWin (X11)
CPU: AMD A10-7860K Radeon R7, 12 Compute Cores 4C+8G (4) @ 3.60 GHz
GPU: AMD Radeon RX 580 Series [Discrete]
Memory: 5.95 GiB / 7.71 GiB (77%)

mesa-vulkan-drivers/testing,now 26.0.0-1 amd64 [installed]
mesa-vulkan-drivers/testing,now 26.0.0-1 i386 [installed,automatic]


[KoboldCpp](https://github.com/LostRuins/koboldcpp) does work, i tried it with the [latest release](https://github.com/LostRuins/koboldcpp/releases/download/v1.109.2/koboldcpp-linux-x64-nocuda)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Illegal instruction (core dumped) #1351

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Illegal instruction (core dumped) #1351

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions