Skip to content

GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316

Open
slomp wants to merge 63 commits intowolfpld:masterfrom
slomp:slomp/d3d12-ring
Open

GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316
slomp wants to merge 63 commits intowolfpld:masterfrom
slomp:slomp/d3d12-ring

Conversation

@slomp
Copy link
Copy Markdown
Collaborator

@slomp slomp commented Mar 23, 2026

This PR removes the need for calling NewFrame() periodically.
(it also addresses and fixes #1294)

Some applications may run "headless" or may use the GPU for compute-only purposes, and do not have the concept of a frame boundary. Timestamp queries are now produced and collected in circular fashion.

In addition, the code is now resilient to situations where timestamp queries are reserved/dispensed, but the enclosing command list is never submitted to the GPU for execution. Such cases will flag warning messages and will "stall" progress, forcing "unresolvable" timestamps queries to be dropped. This also tricky covers corner cases where timestamp query ids are dispensed in one order, but the command lists are sent to the GPU queue in opposing order, or when an operation takes unusually long to complete with respect to everything else.

As such, queue signals are no longer required, nor useful: even with Signal/Wait, there are no guarantees that once the queue sets the signal value, all queries up to that value have indeed completed, since queries could be sent/executed out of the order. in which they were generated, or may not have been submitted to the queue yet (or ever).


The only scenario where ambiguity can still happen is as follows:

  1. a command buffer (CB1) is recorded, but not yet executed
  2. time goes by, and several command buffers are subsequently recorded and executed
  3. Collect() ends up giving up on CB1's "reserved" queries
  4. CB1 is finally sent to the queue and executed
  5. but around the same instant, another command buffer is recorded, claiming query ids that clash with CB1
  6. the GPU will resolve the CB1 timestamps, and write to the timestamp heap
  7. Collect() is not going to be able to disambiguate the timestamp

While technically possible, it is quite a contrived case. There does not seem to be a proper solution for the case above without some amount of costly query tracking along with some CPU-GPU synchronization on top of it. The current implementation is also prone to the same "bad-timing" problem anyway.

@slomp slomp marked this pull request as draft March 23, 2026 23:59
@slomp slomp changed the title eliminate NewFrame, and account for "abandoned" and "out-of-order" ti… D3D12: remove NewFrame() API; account for abandoned/out-of-order timestamp queries Mar 24, 2026
@slomp slomp force-pushed the slomp/d3d12-ring branch 3 times, most recently from 4ce6662 to bf3694e Compare March 29, 2026 21:08
@slomp slomp changed the title D3D12: remove NewFrame() API; account for abandoned/out-of-order timestamp queries D3D12: remove NewFrame() API, and introduce a timeout to Collect() Mar 29, 2026
@slomp slomp changed the title D3D12: remove NewFrame() API, and introduce a timeout to Collect() GPU: D3D12: remove NewFrame() API, and introduce a timeout to Collect() Mar 29, 2026
@slomp slomp force-pushed the slomp/d3d12-ring branch 5 times, most recently from 2a1400b to ee3ba91 Compare April 6, 2026 14:23
@slomp slomp changed the title GPU: D3D12: remove NewFrame() API, and introduce a timeout to Collect() GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure Apr 8, 2026
@slomp slomp force-pushed the slomp/d3d12-ring branch from 1d49121 to 4b15bfa Compare April 8, 2026 21:50
@slomp slomp marked this pull request as ready for review April 9, 2026 16:29
@slomp slomp requested a review from Lectem April 9, 2026 16:29
@slomp slomp force-pushed the slomp/d3d12-ring branch from b5096f9 to c97c895 Compare April 15, 2026 21:00
Comment thread public/tracy/TracyD3D12.hpp Outdated
Comment thread public/tracy/TracyD3D12.hpp Outdated
Comment thread public/tracy/TracyD3D12.hpp
Comment thread public/tracy/TracyD3D12.hpp
Comment thread public/tracy/TracyD3D12.hpp
Comment thread public/tracy/TracyD3D12.hpp Outdated
TracyD3D12Assert( lock.owns_lock() );
TracyD3D12Debug( ZoneValue(m_contextId) );

uint64_t earliestTicket = m_previousCheckpoint;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only writes done to m_previousCheckpoint are done behind the collect lock, you can make this a relaxed load

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit load, be it seq_cst or relaxed I'd rather see the fact it's an atomic op

auto ini = Now();
auto elapsed = 0;
// TODO: could use condition variable to avoid spurious lock + collect iterations
while ((Distance(m_previousCheckpoint, queryTicket) >= 0) && (elapsed < timeout_ms))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit memory ordering for the load please

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use m_previousCheckpoint.load(acquire) if you need to see latest value and rely on the fact timestamp is now free

Collect(lock, queryTicket, false);
elapsed = Now() - ini;
}
return Distance(m_previousCheckpoint, queryTicket) < 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the latest value here, or the latest one you saw from the while loop?
Otherwise explicit memory ordering for the load please.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest value, because the loop could have been exited due to the timeout condition being reached.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use m_previousCheckpoint.load(acquire) if you need to see latest value and rely on the fact timestamp is now free

Comment thread public/tracy/TracyD3D12.hpp Outdated
@slomp slomp force-pushed the slomp/d3d12-ring branch from c97c895 to 223c9c9 Compare April 22, 2026 15:02
@slomp slomp force-pushed the slomp/d3d12-ring branch from 3f7901b to ac6980b Compare April 29, 2026 18:29
@slomp
Copy link
Copy Markdown
Collaborator Author

slomp commented Apr 29, 2026

@Lectem I addressed most of your remarks.

For the atomic memory ordering, I relaxed the two "hottest" paths: NextQueryId() (which gets called concurrently by multiple-threads), and RetireTicket() (which only gets called by whichever thread managed to acquire the lock to run Collect(), but the atomic store is called repeatedly therein).

I could relax a bit the other paths (Drain() and Wait()) but they are only ever called during "panic" situations, which should never or very rarely happen in a well-behaved program. If they do happen, I'd say the client has much bigger fish to fry other than worrying about the impact of strong atomics in cache synchronization. And if we were to relax, we'd need acq_rel semantics, which is not too far apart in terms of overhead from the clean/default sec_cst ordering semantics.

@slomp slomp force-pushed the slomp/d3d12-ring branch from f665e09 to 8692f2a Compare May 4, 2026 23:28
Comment thread public/tracy/TracyD3D12.hpp
{
friend class D3D12ZoneScope;

uint8_t m_contextId = 255; // 255 represents "invalid id"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a constant here and in CreateD3D12Context ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DirectX 12] Tracy GPU profiling deadlocks if the device is removed

2 participants