Skip to content

Coyote accelerator backend#1347

Open
bo3z wants to merge 14 commits intofastmachinelearning:mainfrom
bo3z:coyote-accelerator
Open

Coyote accelerator backend#1347
bo3z wants to merge 14 commits intofastmachinelearning:mainfrom
bo3z:coyote-accelerator

Conversation

@bo3z
Copy link
Contributor

@bo3z bo3z commented Jul 28, 2025

Description

📝 This PR introduces a new accelerator backend, CoyoteAccelerator, which leverages the open-source Coyote shell for deploying models on a PCI-attached FPGA.

Generally, Coyote offers several advantages, when compared to some other shells, including:

  • Networking support, so the backend can easily be extended to support distributed inference. Also interesting for in-network ML.
  • GPU - FPGA integration, so models can be executed on a combination of hardware
  • Dynamic reconfiguration, which could allow run-time reconfiguration of models
  • Multi-tenancy, so multiple models could be deployed concurrently.

The backend is briefly described in Section 9.7 of the paper: https://arxiv.org/pdf/2504.21538.

Type of change

  • New feature (non-breaking change which adds functionality)
  • A new research paper code implementation

Tests

This backend was compared agains a modified* version of the VivadoAccelerator backend: the backend was modified to run HLS synthesis with Vitis instead of Vivado (also using Vitis templates and optimizers), while the rest of the backend infrastructure (drivers, data movers remained the same since they also work in newer version of Vivado). Results are attached below - clearly indicating an advantage in Coyote, for two reasons (1) optimised data movement, bypassing card memory and (2) optimised host-side library (Python, C++).

In principle, the correct test would be to compare against VitisAccelerator (#991), but only after the io_parallel issues are resolved. However, the expectation is that the result will stay mostly the same, sine the underlying platform requires a data copy between host and card memory.

Will add some more results, also for io-stream CNN, and comparisons to VitisAccelerator.

Screenshot 2025-07-28 at 12 11 24

Figure above: comparison of CoyoteAccelerator with modified Vivado Accelerator for the UNSW-NB15 dataset in io_parallel.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@JanFSchulte JanFSchulte added this to the v1.3.0 milestone Nov 5, 2025
lorenzo-as pushed a commit to lorenzo-as/hls4ml that referenced this pull request Dec 9, 2025
…-backend (fastmachinelearning#1347)

Merge branch 'init_interval_fix_zeropad_maxpooling' into coyote-accelerator-and-pooling
Copy link
Contributor

@JanFSchulte JanFSchulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few misc comments based on trying to run the CoyoteAccelerator for a dummy model. Right now, i am stuck with a python import error:
image

which is puzzling because I do have jinja2 installed in my environment and the same import works fine in an interactive python session.

Also, can you fix the pre-commit issues?

filedir = os.path.dirname(os.path.abspath(__file__))
srcpath = os.path.join(filedir, '../contrib/Coyote/')
dstpath = f'{model.config.get_output_dir()}/Coyote'
copytree(srcpath, dstpath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use the dirs_exist_ok argument here? In the current version, this fails when running for the same project twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add.

)

if not os.path.exists(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw'):
os.mkdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to use os.makedirs() because the build folder doesn't exist already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

Copy link
Contributor

@vloncar vloncar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have mostly questions for better understanding and minor nitpicks that I don't feel are crucial, rather optional.

if not os.path.exists(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw'):
os.mkdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw')
os.chdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw')
os.system(cmake_cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that new code (despite being inspired by existing code) should use subprocess instead of os.system and we gradually move towards phasing out os.system since it has limitations on tracking status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

if len(X.shape) == 1:
X = np.array([X])
if not (isinstance(X.dtype, float) or isinstance(X.dtype, np.float32)):
logging.warning('CoyoteOverlay only supports (for now) floating-point inputs; casting input data to float')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we completely switched to logging module or we still use warnings. Do these two play along nicely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure.

I see three instances in the code:

  • In some cases the function warn(...) is used.
  • In some other cases, a normal print is used, e.g., print('WARNING:...')
  • Some code (mostly mine), uses logging.warning.

I've never had issues with logging.warning but happy to change as needed.

#include "defines.h"
#include "host_libs.hpp"

#include <boost/program_options.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require us to have Boost library installed on the host or that comes with Coyote?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's required, and, no it's not installed with Coyote.

One can argue that Boost shouldn't be needed for something as simple as CLI parsing, but Coyote relies internally on Boost for some other functionality (inter-process mutexes).

Currently, Coyote's compilation flow (i.e. CMake) will check if Boost is installed and throw an error if not. To me this sounds okay, as Boost is a fairly straightforward library to install.

avg_latency += (time / 1e3);
avg_throughput += (batch_size / (time * 1e-9));

// Functional correctness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this? Is there a way to just run without checks, as in production?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a CLI argument, which is disabled by default?


}

std::cout << "Batches processed: " << total_batches << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to the Python side with predict()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a CLI argument, verbose?


filedir = os.path.dirname(os.path.abspath(__file__))

f = open(os.path.join(filedir, '../templates/vivado/firmware/myproject.cpp'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine how cool it would be if we used pathlib.Path (which is imported) and resource management (with) like the other modern backends do... 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is several months old now, do you want to update it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once I open-source and merge the V80 code with upstream (hopefully next week), I can update the submodule.

@bo3z bo3z added the feature New hls4ml feature label Jan 30, 2026
@JanFSchulte
Copy link
Contributor

Just confirming that installing jinja2 system-wide solved my issues and I can now successfully run synthesis and create bit files using this back end.

@bo3z
Copy link
Contributor Author

bo3z commented Feb 13, 2026

That's good to now. At some point, I should update Coyote to make sure that jina isn't required system-wide.

If it also works for inference, that's great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New hls4ml feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants