Skip to content
Change the repository type filter

All

    Repositories list

    • TypeScript
      0000Updated Apr 14, 2026Apr 14, 2026
    • aCG

      Public
      GPU-accelerated linear solvers based on the conjugate gradient (CG) method, supporting NVIDIA and AMD GPUs with GPU-aware MPI, NCCL, RCCL or NVSHMEM
      C
      MIT License
      61510Updated Mar 14, 2026Mar 14, 2026
    • mustard

      Public
      A Device-Side Execution Model for Multi-GPU Task Graphs
      Cuda
      MIT License
      1200Updated Feb 28, 2026Feb 28, 2026
    • ucTrace

      Public
      ucTrace: A Multi-Layer Profiling Tool for UCX-driven Communication
      Python
      MIT License
      0200Updated Feb 24, 2026Feb 24, 2026
    • Uniconn

      Public
      Uniconn is a unified, portable high-level C++ communication library that supports both point-to-point and collective operations across GPU clusters. Uniconn ena…
      Cuda
      MIT License
      0300Updated Dec 17, 2025Dec 17, 2025
    • HPC-docs

      Public
      Shell
      0000Updated Nov 28, 2025Nov 28, 2025
    • C++
      1300Updated Mar 27, 2025Mar 27, 2025
    • Modified ucx library to track communications
      C
      Other
      542000Updated Mar 10, 2025Mar 10, 2025
    • Cuda
      1410Updated Jun 13, 2024Jun 13, 2024
    • Snoopie

      Public
      Multi-GPU communication profiler and visualizer
      C
      Other
      43930Updated Jun 10, 2024Jun 10, 2024
    • GPU fusion code and algorithm
      Cuda
      MIT License
      0100Updated May 24, 2024May 24, 2024
    • barnes

      Public
      C
      0000Updated May 15, 2024May 15, 2024
    • 0020Updated May 10, 2024May 10, 2024
    • Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond th…
      Cuda
      MIT License
      32200Updated Apr 25, 2024Apr 25, 2024
    • C
      1300Updated Apr 25, 2024Apr 25, 2024
    • BeyondMoore has an ambitious goal to develop a software framework that performs static and dynamic optimizations, issues accelerator-initiated data transfers, a…
      0200Updated Apr 25, 2024Apr 25, 2024
    • .github

      Public
      Homepage README.
      0000Updated Apr 4, 2024Apr 4, 2024
    • C
      Other
      0000Updated Mar 22, 2024Mar 22, 2024
    • DaCe - Data Centric Parallel Programming
      Python
      BSD 3-Clause "New" or "Revised" License
      154000Updated Feb 2, 2024Feb 2, 2024
    • splash2

      Public
      Splash 2 Benchmarks
      C
      14000Updated Nov 28, 2023Nov 28, 2023
    • ComScribe

      Public
      ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.
      C++
      BSD 3-Clause "New" or "Revised" License
      42712Updated Jul 6, 2023Jul 6, 2023
    • C++
      0000Updated Jun 13, 2023Jun 13, 2023
    • HPCToolkit performance tools: measurement and analysis components
      C++
      60101Updated Mar 17, 2023Mar 17, 2023
    • The microbenchmarks that are used to verify the accuracy of ComDetective.
      Makefile
      2000Updated Mar 17, 2023Mar 17, 2023
    • Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection.
      Cuda
      MIT License
      1610Updated Mar 12, 2023Mar 12, 2023
    • A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.
      Shell
      32110Updated Feb 3, 2023Feb 3, 2023
    • HPCToolkit performance tools: essential third party libraries for hpctoolkit
      Shell
      Other
      6000Updated Oct 9, 2022Oct 9, 2022
    • AMD Research Instruction Based Sampling Toolkit
      C
      17000Updated Aug 6, 2022Aug 6, 2022
    • pardnn

      Public
      C++
      1100Updated May 20, 2022May 20, 2022
    • C
      1000Updated Apr 16, 2022Apr 16, 2022
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.