OClCollMeshEngn: Add bridged delay in finalize()
See the diff of the todo file within this commit for more details. In short, we do this to prevent the possibility of an in-flight async contin accessing metadata that we've already destroyed after finalize() has been called.
This commit is contained in:
@@ -6,8 +6,11 @@
|
||||
#include <vector>
|
||||
#include <string>
|
||||
#include <string_view>
|
||||
#include <algorithm>
|
||||
#include <boost/system/error_code.hpp>
|
||||
#include <boost/asio/deadline_timer.hpp>
|
||||
#include <asynchronousContinuation.h>
|
||||
#include <asynchronousBridge.h>
|
||||
#include <callback.h>
|
||||
#include <asynchronousLoop.h>
|
||||
#include <componentThread.h>
|
||||
@@ -166,7 +169,46 @@ void OpenClCollatingAndMeshingEngine::finalize()
|
||||
|
||||
// Complete any running kernels
|
||||
if (compactIsRunning) { compactKernelComplete(true); }
|
||||
if (collateIsRunning) { collateKernelComplete(std::nullopt, std::nullopt, true); }
|
||||
if (collateIsRunning) {
|
||||
collateKernelComplete(std::nullopt, std::nullopt, true);
|
||||
}
|
||||
|
||||
{
|
||||
/** EXPLANATION:
|
||||
* Calculate the delay as the maximum of the configured delay and any
|
||||
* future delays. The 0 is a placeholder for any delays that will be
|
||||
* introduced in the future. When new delays are added, they should be
|
||||
* included in the std::max() call (e.g., std::max(
|
||||
* OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS, futureDelay1, futureDelay2, 0)).
|
||||
*/
|
||||
int delayMs = std::max(OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS, 0);
|
||||
|
||||
auto& ioService = smoHooksPtr->ComponentThread_getSelf()->getIoService();
|
||||
AsynchronousBridge bridge(ioService);
|
||||
boost::asio::deadline_timer timeoutTimer(ioService);
|
||||
|
||||
/** EXPLANATION:
|
||||
* We wait for delayMs milliseconds to ensure that any in-flight OpenCL
|
||||
* kernel operations have definitely finished. OpenCL kernels cannot be
|
||||
* cancelled once enqueued, so in-flight kernels may still be executing
|
||||
* when finalize() is called. The delay ensures any running kernels
|
||||
* complete and their callbacks execute before we destroy resources.
|
||||
* This prevents use-after-free errors from resumed async continuations
|
||||
* accessing destroyed state.
|
||||
*/
|
||||
timeoutTimer.expires_from_now(
|
||||
boost::posix_time::milliseconds(delayMs));
|
||||
timeoutTimer.async_wait(
|
||||
[&bridge](const boost::system::error_code& error)
|
||||
{
|
||||
(void)error;
|
||||
|
||||
// Always signal complete, whether timeout expired or was cancelled
|
||||
bridge.setAsyncOperationComplete();
|
||||
});
|
||||
|
||||
bridge.waitForAsyncOperationCompleteOrIoServiceStopped();
|
||||
}
|
||||
|
||||
// Release OpenCL buffers via smo hooks
|
||||
if (smoHooksPtr && smoHooksPtr->ComputeManager_releaseUseHostPtrBuffer)
|
||||
|
||||
@@ -21,6 +21,8 @@
|
||||
#include <user/compute.h>
|
||||
#include <user/senseApiDesc.h>
|
||||
|
||||
#define OCLCOLLMESH_ENGN_FINALIZE_DELAY_MS 1
|
||||
|
||||
namespace smo {
|
||||
namespace stim_buff {
|
||||
|
||||
|
||||
@@ -38,18 +38,7 @@
|
||||
|
||||
PcloudStimProducer::stop=>start() sequence:
|
||||
|
||||
OpenClCollatingAndMeshingEngine::finalize():
|
||||
I'm also worried, though less so, about the OClCollMeshEngn: it's
|
||||
a lot less likely to have an in-flight op run past the point where
|
||||
the OClCollMeshEngn object has expired.
|
||||
* But there's still a chance that a long-running OCl kernel could
|
||||
cause an in-flight async contin to resume executing after its
|
||||
OclCollMeshEngn has expired.
|
||||
* We should do a bridged async wait for the std::max() of all
|
||||
timeouts used by OClCollMeshEngn to pass before leaving
|
||||
PcloudStimProducer::stop.
|
||||
|
||||
Attaching and detaching StimBuffs from StimProducers:
|
||||
Attaching and detaching StimBuffs from StimProducers:
|
||||
We've written code recently to attach and detact stimBuffs from a
|
||||
stimProducer. The code is quite nice, but there's this hanging
|
||||
omen over the fact that we put no thought into ensuring that
|
||||
|
||||
Reference in New Issue
Block a user