OClCollMeshEngn: Add bridged delay in finalize()

See the diff of the todo file within this commit for more details.

In short, we do this to prevent the possibility of an in-flight async
contin accessing metadata that we've already destroyed after finalize()
has been called.
This commit is contained in:
2025-11-27 22:26:50 -04:00
parent d49594ef88
commit 313454c426
3 changed files with 46 additions and 13 deletions
@@ -6,8 +6,11 @@
#include <vector>
#include <string>
#include <string_view>
#include <algorithm>
#include <boost/system/error_code.hpp>
#include <boost/asio/deadline_timer.hpp>
#include <asynchronousContinuation.h>
#include <asynchronousBridge.h>
#include <callback.h>
#include <asynchronousLoop.h>
#include <componentThread.h>
@@ -166,7 +169,46 @@ void OpenClCollatingAndMeshingEngine::finalize()
// Complete any running kernels
if (compactIsRunning) { compactKernelComplete(true); }
if (collateIsRunning) { collateKernelComplete(std::nullopt, std::nullopt, true); }
if (collateIsRunning) {
collateKernelComplete(std::nullopt, std::nullopt, true);
}
{
/** EXPLANATION:
* Calculate the delay as the maximum of the configured delay and any
* future delays. The 0 is a placeholder for any delays that will be
* introduced in the future. When new delays are added, they should be
* included in the std::max() call (e.g., std::max(
* OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS, futureDelay1, futureDelay2, 0)).
*/
int delayMs = std::max(OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS, 0);
auto& ioService = smoHooksPtr->ComponentThread_getSelf()->getIoService();
AsynchronousBridge bridge(ioService);
boost::asio::deadline_timer timeoutTimer(ioService);
/** EXPLANATION:
* We wait for delayMs milliseconds to ensure that any in-flight OpenCL
* kernel operations have definitely finished. OpenCL kernels cannot be
* cancelled once enqueued, so in-flight kernels may still be executing
* when finalize() is called. The delay ensures any running kernels
* complete and their callbacks execute before we destroy resources.
* This prevents use-after-free errors from resumed async continuations
* accessing destroyed state.
*/
timeoutTimer.expires_from_now(
boost::posix_time::milliseconds(delayMs));
timeoutTimer.async_wait(
[&bridge](const boost::system::error_code& error)
{
(void)error;
// Always signal complete, whether timeout expired or was cancelled
bridge.setAsyncOperationComplete();
});
bridge.waitForAsyncOperationCompleteOrIoServiceStopped();
}
// Release OpenCL buffers via smo hooks
if (smoHooksPtr && smoHooksPtr->ComputeManager_releaseUseHostPtrBuffer)
@@ -21,6 +21,8 @@
#include <user/compute.h>
#include <user/senseApiDesc.h>
#define OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS 1
namespace smo {
namespace stim_buff {
+1 -12
View File
@@ -38,18 +38,7 @@
PcloudStimProducer::stop=>start() sequence:
OpenClCollatingAndMeshingEngine::finalize():
I'm also worried, though less so, about the OClCollMeshEngn: it's
a lot less likely to have an in-flight op run past the point where
the OClCollMeshEngn object has expired.
* But there's still a chance that a long-running OCl kernel could
cause an in-flight async contin to resume executing after its
OclCollMeshEngn has expired.
* We should do a bridged async wait for the std::max() of all
timeouts used by OClCollMeshEngn to pass before leaving
PcloudStimProducer::stop.
Attaching and detaching StimBuffs from StimProducers:
Attaching and detaching StimBuffs from StimProducers:
We've written code recently to attach and detact stimBuffs from a
stimProducer. The code is quite nice, but there's this hanging
omen over the fact that we put no thought into ensuring that