import numpy as np
from numpy_quaddtype import QuadPrecision, QuadPrecDType
dt = QuadPrecDType(backend='sleef')
def Q(v): return QuadPrecision(str(float(v)), backend='sleef')
A = np.asfortranarray(np.array([[Q(1), Q(2), Q(3)],
[Q(4), Q(5), Q(6)]], dtype=dt))
B = np.asfortranarray(np.array([[Q(1), Q(2)],
[Q(3), Q(4)],
[Q(5), Q(6)]], dtype=dt))
# A is F-contig: strides=(16, 32); B is F-contig: strides=(16, 48)
print("quad F@F:", [[float(v) for v in r] for r in (A @ B)])
# -> [[23.0, 27.0], [35.0, 32.0]] <- WRONG
# Same call against native dtypes works fine — proving the bug is on our side:
A32 = np.asfortranarray(np.array([[1., 2., 3.], [4., 5., 6.]], dtype=np.float32))
B32 = np.asfortranarray(np.array([[1., 2.], [3., 4.], [5., 6.]], dtype=np.float32))
print("float32 F@F:", (A32 @ B32).tolist())
# -> [[22.0, 28.0], [49.0, 64.0]] <- correct
# Workaround: force C-contig before matmul
print("quad (C@C):",
[[float(v) for v in r] for r in (np.ascontiguousarray(A) @ np.ascontiguousarray(B))])
# -> [[22.0, 28.0], [49.0, 64.0]] <- correct
np.matmulon QuadPrecision arrays returns incorrect results whenever either input is F-contiguous (or otherwise not row-major-contiguous in its last two axes).Reproducer
click to expand
Root cause
quad_matmul_strided_loop_alignedinsrc/csrc/umath/matmul.cppunconditionally:This silently assumes:
'R'.Possible fixes
qblas_gemm('R', ...)or('C', ...)when both operands share a contiguous layout, otherwise memcpy the offending operand(s) into a row-major temp buffer (same trick the unaligned path already uses) and callqblas_gemm('R', ...).transa/transb='T'instead, which would requireqblas_gemmto gain transpose support. See [FEAT] Support transpose flagstransa, transb != 'N'in GEMM SwayamInSync/QBLAS#13Related tests
On the #88 two tests already exercise this bug and are marked
xfail:tests/test_dot.py::TestMatmulBatched::test_batched_fortran_ordertests/test_dot.py::TestMatmulBatched::test_batched_transposed_view