109 lines
5.4 KiB
Plaintext
109 lines
5.4 KiB
Plaintext
* Check through all managed objects and properly refcount them
|
|
using shared_ptr.
|
|
* Ensure that we comb through the current code and enforce the distinction
|
|
between user errors and program exceptions.
|
|
* Investigate using UMONITOR/UMWAIT for spinlocks to reduce busy-waiting
|
|
stress/power consumption. Look for a parallel on ARM.
|
|
* Investigate WFE/SEV to reduce busy-waiting in spinlocks on ARM.
|
|
* The input arg `requiredLocks` to LockSet::LockSet() should be
|
|
a ref and not by-value. Propagate this upward into
|
|
SerializedAsyncContin and into all derived classes'
|
|
constructors.
|
|
* In livoxProto1/device.cpp, migrate the registerUdpCommandHandler() calls
|
|
from using the inProgress collection to the per-device collections.
|
|
* In cases where we use boost deadline_timers and pass in an async
|
|
contin as context preservation across the delay, but they aren't
|
|
part of a branch pattern, we may still need to call cancel() on them
|
|
after they expire just in case boost doesn't clean up the internal
|
|
callable that we passed it. Or else we'll have circular sh_ptr
|
|
references in our continuations.
|
|
* UdpCommandDemuxer::registerUdpCommandHandler should accept a pointer
|
|
to the io_context of the thread it should post its callbacks to, and
|
|
then post callbacks to those io_contexts when UDP cmd responses
|
|
come in.
|
|
* Consider using MAP_HUGEPAGE with both PcloudStimBuff::StagingBuffer
|
|
and in the PcloudStimulusBuffer's ringbuff.
|
|
* We should prolly call stream_descriptor::reset() after release()
|
|
whenever we wish to release a desc without closing the underlying
|
|
fd. Because we've discovered that release() doesn't fully cleanup
|
|
internal metadata.
|
|
* There's a bug where deferred production timeslices can result in
|
|
freezing. Explore this and figure out why. When we examined it,
|
|
it didn't appear to be a spinlock-deadlock.
|
|
It seems to be reliably reproducible when we use the NVidia GTX
|
|
card as our OpenCL ComputeDevice, since the GTX card doesn't
|
|
have unified memory with the host cpu complex. This causes the
|
|
kernels to overrun their timelices and triggers repeated
|
|
timeslice deferrals.
|
|
|
|
PcloudStimProducer::stop=>start() sequence:
|
|
IoUringAssemblyEngine::finalize():
|
|
I'm worried that calling PcloudStimProducer::stop() will leave
|
|
in-flight sequences running which will remain alive even after
|
|
the PcloudStimProducer object itself has been destroyed. This may
|
|
be possible for IoUringAssmEngn because it has a running timer
|
|
which may well just time out.
|
|
* There's no reason to think that an in-flight IoUringAssmEngn
|
|
assembly operation won't actually run until it times out. In
|
|
fact, that's the standard case if you configure
|
|
nDgramsPerFrame to be large enough.
|
|
* This means that when we call IoUringAssmEngn::finalize(), an
|
|
in-flight assembly could be going on, which isn't receiving
|
|
any CQE notifications on the eventFd. Thus, that in-flight
|
|
assembly op could plausibly timeout and resume execution
|
|
after IoUringAssemEngn::finalize has completed.
|
|
* We ought to do a bridged async timeout for the std::max()
|
|
of all timeouts used by IoUringAssmEngn.
|
|
|
|
OpenClCollatingAndMeshingEngine::finalize():
|
|
I'm also worried, though less so, about the OClCollMeshEngn: it's
|
|
a lot less likely to have an in-flight op run past the point where
|
|
the OClCollMeshEngn object has expired.
|
|
* But there's still a chance that a long-running OCl kernel could
|
|
cause an in-flight async contin to resume executing after its
|
|
OclCollMeshEngn has expired.
|
|
* We should do a bridged async wait for the std::max() of all
|
|
timeouts used by OClCollMeshEngn to pass before leaving
|
|
PcloudStimProducer::stop.
|
|
|
|
Attaching and detaching StimBuffs from StimProducers:
|
|
We've written code recently to attach and detact stimBuffs from a
|
|
stimProducer. The code is quite nice, but there's this hanging
|
|
omen over the fact that we put no thought into ensuring that
|
|
detachment doesn't cause an in-flight async production op to
|
|
access invalid data.
|
|
|
|
The in-flight async production ops use the SpMcRingbuffs that
|
|
inhabit the stimbuffs. If we don't ensure that all in-flight
|
|
async ops are retired before we detach a stimbuff from a
|
|
producer, we could end up with the producer writing data into
|
|
memory which has been reclaimed and repurposed.
|
|
Similarly, if we're not careful about the order in which we
|
|
assign the stimBuff pointers during attachment, we could
|
|
potentially cause producers to see a partially initialized
|
|
StimBuff object.
|
|
|
|
I think this can be solved without locking/synchronization
|
|
by being very careful to ensure that by the time that
|
|
StimProducer::stop() exits, all in-flight production
|
|
operations are reasonably sure to be halted. If all
|
|
in-flight operations are halted; and if production ops
|
|
cannot be launched while a StimBuff is being attached/
|
|
detached, this means we don't have to worry about accesses
|
|
to stale StimBuff instance state; or access to partially
|
|
initialized StimBuff instance state.
|
|
|
|
So this problem is solved by dealing with the in-flight
|
|
cancelation problem described above, concerning
|
|
[IoUringAssmEngn|OClCollMeshEngn]::start/stop(), and
|
|
StimulusBuffer::start/stop(), and ensuring that after
|
|
stop() has returned, we can be reasonably sure that all
|
|
in-flight ops have exited.
|
|
|
|
Making sh_ptr<StimulusBuffer> atomic for mem barriers:
|
|
We could also complete our implemetation's correctness by converting
|
|
the sh_ptrs to StimulusBuffer inside of the PCloudStimulusProducer
|
|
into std::atomic<std::shared_ptr<StimulusBuffer>>, and using
|
|
std::memory_order_release/memory_order_acquire when writing and
|
|
reading them respectively.
|