* Check through all managed objects and properly refcount them
  using shared_ptr.
* Ensure that we comb through the current code and enforce the distinction
  between user errors and program exceptions.
* Investigate using UMONITOR/UMWAIT for spinlocks to reduce busy-waiting
  stress/power consumption. Look for a parallel on ARM.
* Investigate WFE/SEV to reduce busy-waiting in spinlocks on ARM.
* The input arg `requiredLocks` to LockSet::LockSet() should be
  a ref and not by-value. Propagate this upward into
  SerializedAsyncContin and into all derived classes'
  constructors.
* In livoxProto1/device.cpp, migrate the registerUdpCommandHandler() calls
  from using the inProgress collection to the per-device collections.
* In cases where we use boost deadline_timers and pass in an async
  contin as context preservation across the delay, but they aren't
  part of a branch pattern, we may still need to call cancel() on them
  after they expire just in case boost doesn't clean up the internal
  callable that we passed it. Or else we'll have circular sh_ptr
  references in our continuations.
* UdpCommandDemuxer::registerUdpCommandHandler should accept a pointer
  to the io_context of the thread it should post its callbacks to, and
  then post callbacks to those io_contexts when UDP cmd responses
  come in.
* We should make the LivoxProto1/LivoxGen1 libs use the UdpCmdDemuxer
  the Pcloud and IMU data, in addition to the commands. This is required
  to support multiple LivoxProto1/Gen1 devices. Right now, one device
  will receive all of the pcloud/imu data dgrams if all the devices on
  the subnet are using the same port numbers for pcloud/imu data dgrams.
* Consider using MAP_HUGEPAGE with both PcloudStimBuff::StagingBuffer
  and in the PcloudStimulusBuffer's ringbuff.
* We should prolly call stream_descriptor::reset() after release()
  whenever we wish to release a desc without closing the underlying
  fd. Because we've discovered that release() doesn't fully cleanup
  internal metadata.
* There's a bug where deferred production timeslices can result in
  freezing. Explore this and figure out why. When we examined it,
  it didn't appear to be a spinlock-deadlock.
  It seems to be reliably reproducible when we use the NVidia GTX
  card as our OpenCL ComputeDevice, since the GTX card doesn't
  have unified memory with the host cpu complex. This causes the
  kernels to overrun their timelices and triggers repeated
  timeslice deferrals.
