Todo: Update

By solving the issues in finalize() for IoUringAssmEngn and
OClCollMeshEngn, we've solved this as a side-effect.
This commit is contained in:
2025-11-27 22:29:05 -04:00
parent 313454c426
commit 1e76d51c41
-36
View File
@@ -35,39 +35,3 @@
have unified memory with the host cpu complex. This causes the have unified memory with the host cpu complex. This causes the
kernels to overrun their timelices and triggers repeated kernels to overrun their timelices and triggers repeated
timeslice deferrals. timeslice deferrals.
PcloudStimProducer::stop=>start() sequence:
Attaching and detaching StimBuffs from StimProducers:
We've written code recently to attach and detact stimBuffs from a
stimProducer. The code is quite nice, but there's this hanging
omen over the fact that we put no thought into ensuring that
detachment doesn't cause an in-flight async production op to
access invalid data.
The in-flight async production ops use the SpMcRingbuffs that
inhabit the stimbuffs. If we don't ensure that all in-flight
async ops are retired before we detach a stimbuff from a
producer, we could end up with the producer writing data into
memory which has been reclaimed and repurposed.
Similarly, if we're not careful about the order in which we
assign the stimBuff pointers during attachment, we could
potentially cause producers to see a partially initialized
StimBuff object.
I think this can be solved without locking/synchronization
by being very careful to ensure that by the time that
StimProducer::stop() exits, all in-flight production
operations are reasonably sure to be halted. If all
in-flight operations are halted; and if production ops
cannot be launched while a StimBuff is being attached/
detached, this means we don't have to worry about accesses
to stale StimBuff instance state; or access to partially
initialized StimBuff instance state.
So this problem is solved by dealing with the in-flight
cancelation problem described above, concerning
[IoUringAssmEngn|OClCollMeshEngn]::start/stop(), and
StimulusBuffer::start/stop(), and ensuring that after
stop() has returned, we can be reasonably sure that all
in-flight ops have exited.