Environment: - Windows 11 - Docker Desktop v4.64.0
Problem: The Docker Model Runner HTTP API server does not respond to HTTP requests, although all settings are correctly configured. Settings enabled: ✅ Enable Docker Model Runner ✅ Enable host-side TCP support (Port 12434) ✅ Enable GPU-backed inference The model loads correctly into the GPU (dedicated VRAM is being used).
Model loads fine to GPU via CLI or Docker Desktop (docker model run gpt-oss "test" works), but curl http://localhost:12434/engines/v1/models returns "Empty reply from server".
Workaround discovered: When I changed the port from 12434 to any other port (e.g., 12357), curl http://localhost:12357/engines/v1/models works immediately without any issues.