* Check through all managed objects and properly refcount them using shared_ptr. * Ensure that we comb through the current code and enforce the distinction between user errors and program exceptions. * Investigate using UMONITOR/UMWAIT for spinlocks to reduce busy-waiting stress/power consumption. Look for a parallel on ARM. * Investigate WFE/SEV to reduce busy-waiting in spinlocks on ARM. * The input arg `requiredLocks` to LockSet::LockSet() should be a ref and not by-value. Propagate this upward into SerializedAsyncContin and into all derived classes' constructors. * In livoxProto1/device.cpp, migrate the registerUdpCommandHandler() calls from using the inProgress collection to the per-device collections. * In cases where we use boost deadline_timers and pass in an async contin as context preservation across the delay, but they aren't part of a branch pattern, we may still need to call cancel() on them after they expire just in case boost doesn't clean up the internal callable that we passed it. Or else we'll have circular sh_ptr references in our continuations. * UdpCommandDemuxer::registerUdpCommandHandler should accept a pointer to the io_context of the thread it should post its callbacks to, and then post callbacks to those io_contexts when UDP cmd responses come in. * Consider using MAP_HUGEPAGE with both PcloudStimBuff::StagingBuffer and in the PcloudStimulusBuffer's ringbuff. * We should prolly call stream_descriptor::reset() after release() whenever we wish to release a desc without closing the underlying fd. Because we've discovered that release() doesn't fully cleanup internal metadata.