Skip to content

Stale comment in examples/cute/tutorial/hopper/wgmma_sm90.cu #3340

Description

@gambiTarun

The comment for the gemm_nt's tiled copyA/B says the thread layout used is 32x4, but the used thread layout is 16x8.

  // Define the thread layouts (static)
  TiledCopy copyA = make_tiled_copy(Copy_Atom<SM80_CP_ASYNC_CACHEALWAYS<uint128_t>, TA>{},
                                    Layout<Shape<_16,_8>>{}, // Thr layout 32x4 m-major
                                    Layout<Shape< _8,_1>>{});// Val layout  8x1 m-major
  TiledCopy copyB = make_tiled_copy(Copy_Atom<SM80_CP_ASYNC_CACHEALWAYS<uint128_t>, TB>{},
                                    Layout<Shape<_16,_8>>{}, // Thr layout 32x4 n-major
                                    Layout<Shape< _8,_1>>{});// Val layout  8x1 n-major

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions