Replace the current delay timeout mechanism with a spinlock.
Both mechanisms try to eliminate the possibility of an in-flight
async op accessing state that has been destroyed by stop().
But the spinlock is less arbitrary.
See the diff of the todo file within this commit for more details.
In short, we do this to prevent the possibility of an in-flight async
contin accessing metadata that we've already destroyed after finalize()
has been called.
See the diff of the todo file within this patch for more
details.
This is to eliminate the possibility of having an in-flight async
contin access metadata that we destroyed in finalize().
These two classes represent our first foray into stencil
construction. One of them standardizes PcloudAmbience stencils
across all stimbuffs, and the other specifies the internal
memory constraints and requirements for a LivoxGen1 device's
stencils.
This change is a bit pedantic, but since these vars aren't accessed
in any hotpath, it's fine to be pedantic. We made these sh_ptrs
atomic so we can use acquire and release side effects when loading
and storing them. This doesn't eliminate the problem of seeing
inconsistent state across microcontrollers, but it helps with simple
accesses like these ones we already do.
Reduces code duplication, centralizes checking and enforces consistent
behaviour across producers.
Also reordered the writes to the sh_ptr<StimulusBuffer>s such that
the pointers are written last.
PcloudStimulusBuffer::produceFrameReq():
Now correctly produces into the stim frames for the
PcloudIntensityStimulusBuffer object that's attached to the
PcloudStimulusProducer. If there's no attached I stimbuff, then
the OpenCL kernel will simply not write out the intensity data.
This is the first moment when we actually use the SP-MC ringbuffer
properly and actually cycle through the frames, producing into
them one by one.
This now ensures that finalizeReq is indeed called from mrntt,
since exception-experiencing threads will post an exceptionInd
to mrntt, which will then call finalizeReq.
This ensures that we can avoid races when adding and removing
stimbuffs to a stimproducer.
At least in theory. I can think of some ways in which this current
design may result in races or other bad conditions.
Slots whose stride size is larger than the slot alignment value
should have their size rounded up to the alignment size so that
the slots that follow them will also be aligned.
We added a new centralized OpenCL Compute manager. This can later
be extended to support CUDA, SyCL, etc. SMO can be configured at
build time to choose which API it will use for compute.
Moreover, the ComputeMgr allows us to register buffers which are
available to all cl_contexts.
We now allocate all the stimFrames for a StimBuffer using a
single StagingBuffer. This gives us all the benefits we're
looking for (pinning, alignment, etc).