OClCollMeshEngn: Add bridged delay in finalize()

See the diff of the todo file within this commit for more details.

In short, we do this to prevent the possibility of an in-flight async
contin accessing metadata that we've already destroyed after finalize()
has been called.
This commit is contained in:
2025-11-27 22:26:50 -04:00
parent d49594ef88
commit 313454c426
3 changed files with 46 additions and 13 deletions
@@ -6,8 +6,11 @@
#include <vector> #include <vector>
#include <string> #include <string>
#include <string_view> #include <string_view>
#include <algorithm>
#include <boost/system/error_code.hpp> #include <boost/system/error_code.hpp>
#include <boost/asio/deadline_timer.hpp>
#include <asynchronousContinuation.h> #include <asynchronousContinuation.h>
#include <asynchronousBridge.h>
#include <callback.h> #include <callback.h>
#include <asynchronousLoop.h> #include <asynchronousLoop.h>
#include <componentThread.h> #include <componentThread.h>
@@ -166,7 +169,46 @@ void OpenClCollatingAndMeshingEngine::finalize()
// Complete any running kernels // Complete any running kernels
if (compactIsRunning) { compactKernelComplete(true); } if (compactIsRunning) { compactKernelComplete(true); }
if (collateIsRunning) { collateKernelComplete(std::nullopt, std::nullopt, true); } if (collateIsRunning) {
collateKernelComplete(std::nullopt, std::nullopt, true);
}
{
/** EXPLANATION:
* Calculate the delay as the maximum of the configured delay and any
* future delays. The 0 is a placeholder for any delays that will be
* introduced in the future. When new delays are added, they should be
* included in the std::max() call (e.g., std::max(
* OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS, futureDelay1, futureDelay2, 0)).
*/
int delayMs = std::max(OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS, 0);
auto& ioService = smoHooksPtr->ComponentThread_getSelf()->getIoService();
AsynchronousBridge bridge(ioService);
boost::asio::deadline_timer timeoutTimer(ioService);
/** EXPLANATION:
* We wait for delayMs milliseconds to ensure that any in-flight OpenCL
* kernel operations have definitely finished. OpenCL kernels cannot be
* cancelled once enqueued, so in-flight kernels may still be executing
* when finalize() is called. The delay ensures any running kernels
* complete and their callbacks execute before we destroy resources.
* This prevents use-after-free errors from resumed async continuations
* accessing destroyed state.
*/
timeoutTimer.expires_from_now(
boost::posix_time::milliseconds(delayMs));
timeoutTimer.async_wait(
[&bridge](const boost::system::error_code& error)
{
(void)error;
// Always signal complete, whether timeout expired or was cancelled
bridge.setAsyncOperationComplete();
});
bridge.waitForAsyncOperationCompleteOrIoServiceStopped();
}
// Release OpenCL buffers via smo hooks // Release OpenCL buffers via smo hooks
if (smoHooksPtr && smoHooksPtr->ComputeManager_releaseUseHostPtrBuffer) if (smoHooksPtr && smoHooksPtr->ComputeManager_releaseUseHostPtrBuffer)
@@ -21,6 +21,8 @@
#include <user/compute.h> #include <user/compute.h>
#include <user/senseApiDesc.h> #include <user/senseApiDesc.h>
#define OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS 1
namespace smo { namespace smo {
namespace stim_buff { namespace stim_buff {
+1 -12
View File
@@ -38,18 +38,7 @@
PcloudStimProducer::stop=>start() sequence: PcloudStimProducer::stop=>start() sequence:
OpenClCollatingAndMeshingEngine::finalize(): Attaching and detaching StimBuffs from StimProducers:
I'm also worried, though less so, about the OClCollMeshEngn: it's
a lot less likely to have an in-flight op run past the point where
the OClCollMeshEngn object has expired.
* But there's still a chance that a long-running OCl kernel could
cause an in-flight async contin to resume executing after its
OclCollMeshEngn has expired.
* We should do a bridged async wait for the std::max() of all
timeouts used by OClCollMeshEngn to pass before leaving
PcloudStimProducer::stop.
Attaching and detaching StimBuffs from StimProducers:
We've written code recently to attach and detact stimBuffs from a We've written code recently to attach and detact stimBuffs from a
stimProducer. The code is quite nice, but there's this hanging stimProducer. The code is quite nice, but there's this hanging
omen over the fact that we put no thought into ensuring that omen over the fact that we put no thought into ensuring that