GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316
GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316slomp wants to merge 63 commits intowolfpld:masterfrom
Conversation
4ce6662 to
bf3694e
Compare
2a1400b to
ee3ba91
Compare
| TracyD3D12Assert( lock.owns_lock() ); | ||
| TracyD3D12Debug( ZoneValue(m_contextId) ); | ||
|
|
||
| uint64_t earliestTicket = m_previousCheckpoint; |
There was a problem hiding this comment.
If the only writes done to m_previousCheckpoint are done behind the collect lock, you can make this a relaxed load
There was a problem hiding this comment.
explicit load, be it seq_cst or relaxed I'd rather see the fact it's an atomic op
| auto ini = Now(); | ||
| auto elapsed = 0; | ||
| // TODO: could use condition variable to avoid spurious lock + collect iterations | ||
| while ((Distance(m_previousCheckpoint, queryTicket) >= 0) && (elapsed < timeout_ms)) |
There was a problem hiding this comment.
explicit memory ordering for the load please
There was a problem hiding this comment.
Please use m_previousCheckpoint.load(acquire) if you need to see latest value and rely on the fact timestamp is now free
| Collect(lock, queryTicket, false); | ||
| elapsed = Now() - ini; | ||
| } | ||
| return Distance(m_previousCheckpoint, queryTicket) < 0; |
There was a problem hiding this comment.
Do you want the latest value here, or the latest one you saw from the while loop?
Otherwise explicit memory ordering for the load please.
There was a problem hiding this comment.
Latest value, because the loop could have been exited due to the timeout condition being reached.
There was a problem hiding this comment.
Please use m_previousCheckpoint.load(acquire) if you need to see latest value and rely on the fact timestamp is now free
|
@Lectem I addressed most of your remarks. For the atomic memory ordering, I relaxed the two "hottest" paths: I could relax a bit the other paths ( |
…t makeshift timestamps for dropped timestamps
| { | ||
| friend class D3D12ZoneScope; | ||
|
|
||
| uint8_t m_contextId = 255; // 255 represents "invalid id" |
There was a problem hiding this comment.
Could we use a constant here and in CreateD3D12Context ?
This PR removes the need for calling
NewFrame()periodically.(it also addresses and fixes #1294)
Some applications may run "headless" or may use the GPU for compute-only purposes, and do not have the concept of a frame boundary. Timestamp queries are now produced and collected in circular fashion.
In addition, the code is now resilient to situations where timestamp queries are reserved/dispensed, but the enclosing command list is never submitted to the GPU for execution. Such cases will flag warning messages and will "stall" progress, forcing "unresolvable" timestamps queries to be dropped. This also tricky covers corner cases where timestamp query ids are dispensed in one order, but the command lists are sent to the GPU queue in opposing order, or when an operation takes unusually long to complete with respect to everything else.
As such, queue signals are no longer required, nor useful: even with Signal/Wait, there are no guarantees that once the queue sets the signal value, all queries up to that value have indeed completed, since queries could be sent/executed out of the order. in which they were generated, or may not have been submitted to the queue yet (or ever).
The only scenario where ambiguity can still happen is as follows:
While technically possible, it is quite a contrived case. There does not seem to be a proper solution for the case above without some amount of costly query tracking along with some CPU-GPU synchronization on top of it. The current implementation is also prone to the same "bad-timing" problem anyway.