Docs: New qutexes design

2025-09-18 20:29:37 -04:00
parent 9a23dbbe95
commit b49e281010
1 changed files with 518 additions and 0 deletions
@@ -0,0 +1,518 @@
+I just realized that my spinqueueing mechanism is highly power inefficient if a
+lock needs to be held across a "true async wait"—where the async sequence
+actually waits on a hardware bottleneck. In this case, the thread acquires the
+spinlock, then goes to sleep in the kernel schedq until some hardware event
+occurs, and is then awakened—all while still holding the spinlock.
+
+Meanwhile, other sequences running on other threads and contending for that lock
+will be Qspinning. This is acceptable if all I care about is maximum throughput:
+the Qspinning just re-posts the sequences back into the Q, and eventually
+they'll acquire the LockSet and proceed.
+
+Importantly, since the thread itself isn't slept in the kschedQ, it will be
+deQing and processing other sequences that aren't bottlenecked on the lock held
+by the sequence waiting for the hardware response. Throughput is indeed
+maximized.
+
+However, I just realized that if kernel mutexes expose FD events, I can apply
+this same logic to sleeplocks: I can wait on the sleeplocks asynchronously
+instead of synchronously. If I can make my asio::io_service wait on all the
+mutex FDs requested by sequences on the current thread, then in theory I can put
+the thread to sleep and know that when the mutex becomes available, I'll be
+awakened again.
+
+Hence, I can get the best of both worlds: maximum throughput and power saving.
+Instead of spinqueueing, we just add the lock FDs to an FD set to be waited on
+by asio. If any of those locks become available, the kernel scheduler will
+awaken our asioQ thread, and we can then awaken and retry the lock.
+
+## Boost asio queue-based sleep locking:
+
+Instead of using FDs, we can also try to use a fifo Q based mechanism: each lock
+is a spinlock and a fifo queue.
+
+Acquire:
+```
+lock(spinlock);
+q.push_back(self);
+head = q.peek_front();
+if (head == self) {
+	// We acquired the lock.
+	unlock(spinlock);
+	return;
+}
+
+unlock(spinlock);
+```
+
+release:
+```
+lock(spinlock);
+// Should get back ourself.
+q.pop_front();
+// Wake up the next request in the q.
+head = q.peek_front();
+if (head == NULL) {
+	// Nobody was waiting.
+	unlock(spinlock);
+	return;
+}
+
+head.thread_to_wake.getIoService().post([]{
+	// This lambda causes thread_to_wake to check this lock's
+	// Q and then proceed to execute since it now owns the lock.
+});
+unlock(spinlock);
+```
+
+Something like this: it causes the entire thing to be, at least ostensibly,
+in userspace -- though idk how Boost handles its queues internally.
+
+## Priortizing LockSets:
+
+One problem we have with a FIFO-based sleeping system is that it makes it very
+unlikely that LockSets will ever acquire all of their locks, if there are
+contenders for those same locks who only need to acquire one of the locks in
+that LockSet.
+
+We could theoretically give locksets an advantage by not making them backout
+if they fail to acquire all locks in their set. I.e: if they get 2/3, then they
+hold those 2 and then wait for the 3rd. This is problematic because it leaves
+room open for deadlocks in the form of T1 and T2 needing both LockA and LockB,
+but they acquire them in reverse order. I.e: T1 takes LockA and now waits for
+LockB; and T2 takes LockB and now waits for LockA. This will now happen among
+the LockSets if we don't impose backing out. It may be possible to avoid this
+using very careful lock ordering and dependency analysis but this project is
+asynchronous the locking is done in the async sequences and not in the sync
+accessor functions. So this kind of analysis is almost impossible to do.
+
+
+We need to think of a way to make the FIFOs biased toward LockSets so that they
+have an advantage over single-lock acquirers. Or else LockSet sequences will be
+starved.
+
+### Timed backoff:
+
+We could have Locksets be greedy and try to hold on to the locks they've
+acquired (say, 2/3 and then wait for the 3rd) but then be forced to backoff
+after a timeout.
+
+This introduces async event complexity and also the timeout we choose is almost
+guaranteed to be arbitrary.
+
+### Fractionally inserted FIFOs:
+
+We insert sequences with a LockSet.size() of 1, at the back.
+We insert all other sequences (>1) into first 1/LockSet.size()th position in the
+Queue.
+So a Lockset of size 2 will be inserted at the end of the first half of the
+items in the queue.
+A Lockset of size 3 will be inserted at the end of the first 33% of items.
+A lockset of size 4 will be inserted at the end of the first 25% of items.
+And so on.
+
+This ensures that higher LockSet.size()s will be prioritized ever higher, and
+at the same time they don't completely hog everything. Those single-lock
+sequences that have already naturally progressed past the fraction-mark of a
+given LockSet size will continue making progress toward the front.
+
+For queueing sequences with Locksets>1, we can enQ them on the FIFO of the first
+lock in their set. They'll back off each time anyway, so they'll always be
+re-trying from the first lock in their set each time.
+
+#### Impl details:
+
+We'd like to use std::unordered_set because insertion will require lots of
+moving items around, but we'll have to use std::vector because we need direct
+access to insert at arbitrary fractional indexes. It's unlikely the number of
+items in any lock's Q will ever be large enough to require lots of displacement,
+but welp there's no reason not to plan for scaling. Although if we end up
+needing scaling that's a symptom of a bigger problem...with scaling itself lol.
+There shouldn't be enough items blocked on a lock that we have to design the
+lock's queue to be scalable.
+
+### Inverted Fractionally acquired locksets:
+
+The previous ideas of fractionally inserted lockQs was okay, but the acquisition
+algo required that the async seq be at the front of a locks queue to
+successfully acquire that lock. That makes it almost impossible for Locksets>1
+to ever acquire all of their locks. If we add backoff to that, it basically
+means no lockset will ever acquire all of its locks.
+
+Instead what we now do is always insert at the rear (push_back()) and then when
+acquiring, we check to see if the sequence is in the first
+1/(1/(LockSet.size())), and if so, it successfully acquires the lock. I.e: if
+the sequence item isn't in the LAST 1/(LockSet.size()) items, then it succeeds.
+* For a lockset of size=1: It must be at the front of the queue.
+* Lockset.size=2: it must be in the first 50% of items.
+* Lockset.size=3: it must be in the first 66% of items.
+* Lockset.size=4: It must be in the first 75% of items.
+
+So this way larger LockSets are favoured, but 1-size locksets make progress.
+
+For performance:
+* We obv can just scan the smaller tail percentage for the item instead of
+  scanning the larger front percentage.
+* If we use a doubly-linked list, we can prolly keep the insertion iterator
+  and this way we won't have to actually find the item in the lockQ when we wish
+  to eventually remove it from the lockQ when releasing the lock.
+
+## Total overall design:
+
+### Asio queues and Lockvokers:
+
+Lockvokers are initially enqueued on a CompThread's queue. When the lockvoker
+first runs, it checks a flag to see if it has been "registered" into the queues
+for all locks in its set. If not, then it "registers" itself in each lock's
+ticketQ and then attempts to acquire each lock. Registration and acquisition
+are logically separate operations; and locks will often attempt acquisition
+many times after first registering, without needing to register again. Ideally
+we can implement a LockSet::registerAndTryAcquireAll() method, but that's for
+us to think about later.
+
+```
+class LockSet
+{
+	/* Add this either inside of LockSet or outside of it -- depends on whether
+	 * it's we can get it to compile because I'm seeing some potential circular
+	 * definition dependencies.
+	 */
+	typedef std::pair<TicketLock, LockDesc> LockDesc;
+
+	/* Find a LockDesc -- useful below */
+	LockDesc &getLockDesc(TicketLock &criterionLock)
+	{
+		for (auto &reqLock: requiredLocks) {
+			if (reqLock.first == &criterionLock) { return reqLock; }
+		}
+
+		// Should never happen.
+		throw;
+	}
+};
+
+LockSet::register(LockerAndInvoker &lockvoker)
+{
+	for (auto &lock: lockset.locks) {
+		// Register the Lockvoker object in each lock's ticketQ.
+		lock.second = lock.first.register(lockvoker);
+	}
+	registered = true;
+}
+
+bool LockSet::tryAcquire(LockerAndInvoker &lockvoker)
+{
+	if (!registered) {
+		// Should never happen.
+		throw ...;
+	}
+	int nLocksAcquired=0,
+		nLocksInSet = lockset.size();
+	for (auto &lock: lockset.locks) {
+		if (!lock.first.tryAcquire(nLocksInSet)) {
+			break;
+		}
+
+		nLocksAcquired++;
+	}
+
+	if (nLocksAcquired == nLocksInSet) {
+		// Success
+		return true;
+	}
+
+	for (int i=0; i<nLocksAcquired; i++) {
+		// Backoff does different stuff from release();
+		locks[i].first.backoff(lockvoker);
+	}
+}
+
+LockSet::release()
+{
+	for (auto &lock: requiredLocks) {
+		lock.release();
+	}
+}
+```
+
+Now, the TicketLock class is what we'll use for synchronization. It's just a
+combination of 2 atomic<bool> and a std::list.
+
+```
+class SpinLock
+{
+	/* Modify to add methods acquire() and release() which busy-wait.
+	 */
+	void acquire();
+	void release();
+};
+
+class LockSet
+{
+	/* Modify the std::vector of SpinLock to instead be:
+	 *	std::vector<LockDesc> locks;
+	 */
+	std::vector<LockDesc> locks;
+}
+
+class TicketLock
+{
+public:
+	std::list<LockerAndInvoker>::iterator register(
+		const LockerAndInvoker &lockvoker
+		)
+	{
+		/** EXPLANATION:
+		 * Just insert the lockvoker into the rear of the list.
+		 *
+		 * Then, since we want to store the 
+		 */
+		std::list<LockerAndInvoker>::iterator it;
+
+		lock.acquire();
+		queue.push_back(lockvoker);
+		it = queue.end();
+		--it;
+		lock.release();
+
+		return it;
+	}
+
+	void unregister(std::list<LockerAndInvoker>::iterator it)
+	{
+		lock.acquire();
+		queue.erase(it);
+		lock.release();
+	}
+
+	bool tryAcquire(LockerAndInvoker &tryingLockvoker, int nRequiredLocks)
+	{
+		lock.acquire();
+		qNItems = queue.size();
+
+		if (qNItems < 1) {
+			lock.release();
+
+			/**	EXPLANATION:
+			 * requiredLocks before ever trying to tryAcquire() them, so if
+			 * tryAcquire is being called, that must mean that queue.size() > 0.
+			 *
+			 * Ergo this should never happen.
+			 */
+			throw;
+		}
+
+		if (isOwned) {
+			lock.release();
+			return false;
+		}
+
+		if (nRequiredLocks == 1) {
+			isOwned.store(true);
+			lock.release();
+			return true;
+		}
+
+		/**	EXPLANATION:
+		 * From here:
+		 * if qNItems == 1 the we are the only one in the ticketQ and we have
+		 *	successfully acquired the lock.
+		 * If qNitems / nRequiredLocks == 0, then we acquire by default since
+		 *	the number of items in the ticketQ guarantees that we are in the top
+		 *	X% for that nRequiredLocks.
+		 * If qNItems / nRequiredLocks >= 1, then we must do the normal algo:
+		 *	Check the last (qNItems/nRequiredLocks) items, and if the item isn't
+		 *	in those items, then it must be in the earlier ones (obviously).
+		 *	Hence this Lockvoker acquisition should be considered successful.
+		 *
+		 *	EXPLANATION 2:
+		 * You'll notice that we don't do actual percentages but rather we just
+		 * do discrete fractions -- this makes the algo more deterministic
+		 * and much easier to reason about. I.e:
+		 *	If nRequiredLocks is 6 and qNItems==3:
+		 *		we don't actually calculate that the Lockvoker item must be in
+		 *		the top (100-17%), and then try to calculate whether we ought to
+		 *		consider the 3rd item to be in the last 17-percentile. We just
+		 *		do a fractional count and assume complete discreteness.
+		 */
+		const int nRearItemsToScan = qNItems / nRequiredLocks;
+
+		if (qNItems == 1 || nRearItemsToScan < 1) {
+			isOwned.store(true);
+			lock.release();
+			return true;
+		}
+
+		auto rIt = queue.rbegin();
+		auto rEndIt = queue.rend();
+		bool foundInRear = false;
+		for (int i=0; i<nRearItemsToScan && rIt != rEndIt; rIt++, i++)
+		{
+			if (&(*rIt) != &tryingLockvoker) { continue; }
+
+			foundInRear == true;
+			break;
+		}
+
+		if (foundInRear) {
+			lock.release();
+			return false;
+		}
+
+		/* Not found in rear: this means the item is in the top X%. That means
+		 * it should be allowed to claim the lock.
+		 */
+		isOwned.store(true);
+		lock.release();
+		return true;
+	}
+
+	backoff(LockerAndInvoker &failedAcquirer)
+	{
+		lock.acquire();
+
+		// Rotate queue members if failedAcquirer is at front of queue.
+		LockerAndInvoker &currFront = queue.front();
+		if (currFront == failedAcquirer)
+		{
+			/**	EXPLANATION:
+			 * Rotate the top LockSet.size() items in the queue by moving
+			 * the failedAcquirer to the last position in the top
+			 * LockSet.size() items within the queue.
+			 *
+			 * I.e: if queue.size()==20, and lockSet.size()==5, then move
+			 * failedAcquirer from the front the 5th position in the queue,
+			 * which should push the other 4 items forward.
+			 * If queue.size==3 and LockSet.size()==5, then just
+			 * push_back(failedAcquirer).
+			 *
+			 * It is impossible for a TicketLock queue to have only one
+			 * item in it, yet for that Lockvoker item to have failed to
+			 * acquire the ticketLock. Being the only item in the ticketQ
+			 * means that you must succeed at acquiring the TicketLock.
+			 */
+			int swapPosition = min(
+				queue.size(),
+				failedAcquirer.serializedContinuation.requiredLocks.size());
+
+			/*	EXPLANATION:
+			 * Swap them here.
+			 *
+			 * The reason why we do this swap is to avoid a particular kind of
+			 * deadlock wherein a grid of async requests is perfectly configured
+			 * so as to guarantee that none of them can make any forward
+			 * progress unless they get reordered.
+			 *
+			 * Consider 2 different locks with 2 different items in them
+			 * each, both of which come from 2 particular requests:
+			 *	TicketLock1: Lockvoker1, Lv2
+			 *	TicketLock2: Lv2, Lv1
+			 *
+			 * Moreover, both of these lockvokers have requiredLocks.size()==2,
+			 * and the particular 2 locks that each one requires are indeed
+			 * TicketLock1 and TicketLock2.
+			 *
+			 * This particular setup basically means that in TL1's queue, Lv1
+			 * will wakeup since it's at the front of TL1. It'll successfully
+			 * acquire TL1 (since it's at the front), and then it'll try to
+			 * acquire TL2. But since Lv1 isn't in the top 50% of items in TL2's
+			 * queue, Lv1 will fail to acquire TL2.
+			 *
+			 * Then similarly, in TL2's queue, Lv2 will wakeup since it's at
+			 * the front. Again, it'll successfully acquire TL2 since it's at
+			 * the front of TL2's queue. But then it'll try to acquire TL1.
+			 * Since it's not in the top 50% of TL1's enqueued items, it'll fail
+			 * to acquire TL1.
+			 *
+			 * N.B: This type of perfectly ordered deadlock can occur in any
+			 * kind of NxN situation where ticketQ.size()==requiredLocks.size().
+			 * That could be 4x4, 5x5, 6x6, etc. It doesn't happen in 1x1
+			 * because a Lockvoker that only requires one lock will always just
+			 * succeed if it's at the front of its queue.
+			 *
+			 * This state of affairs is stable and will persist unless these
+			 * queues are reordered in some way. Hence: that's why we rotate the
+			 * items in a TicketLockQ after backing off of it. Backing off means
+			 * Not necessarily that the calling LockVoker failed to acquire
+			 * THIS PARTICULAR TicketLock, but rather than it failed to acquire
+			 * ALL of its required locks.
+			 *
+			 * Hence, if we are backing out, we should also rotate the items
+			 * in the queue if the current front item is the failed acquirer.
+			 * So that's why we do this rotation here.
+			 */
+		}
+
+		LockerAndInvoker &newFront = queue.front();
+
+		lock.release();
+
+		wakeUp(newFront);
+	}
+
+	void release(LockerAndInvoker &prevOwner)
+	{
+		lock.acquire();
+
+#ifndef CONFIG_LOCKVOKER_AGGRESSIVE_WAKEUPS
+		LockerAndInvoker	&oldFront = queue.front();
+#endif
+
+		unregister(prevOwner.serializedContinuation.requiredLocks.getDesc(
+			*this).second);
+
+		/** NOTE:
+		 * I am not sure whether we should only wake up the front item if
+		 * the prev owner was the previous front item; or whether we should
+		 * always wake up the new front item.
+		 *
+		 * Recall that because a sequence can acquire a ticketLock without being
+		 * at the front of the queue (because it could merely be in the top X%
+		 * of items instead), this means that during a call to release(), the
+		 * owning async sequence may not be at the front -- it needs only be in
+		 * the top X% of items.
+		 *
+		 * When the owning sequence is the front, then we should definitely wake
+		 * the new front item after removing the previous owner.
+		 *
+		 * But when the front item is not the prev owner, then should we wake it
+		 * up? It should have been awakened previously, so if it's still at the
+		 * front, that implies that it failed to make forward progress last time
+		 * it awoke. You could argue that during the interim while this owner
+		 * did its thing, it's possible for the current front item to have
+		 * become capable of acquiring all of its requiredLocks, due to changes
+		 * in the other locks in its requiredLocks set. This is a fair argument.
+		 *
+		 * You could also argue that we can just wait until its registered
+		 * items in its other locks' ticketQs reach the front of those other
+		 * ticketQs, and then when that happens, those locks will wake it up
+		 * when release() is called.
+		 *
+		 * The latter eases contention since one could argue that we have a
+		 * surer chance of it successfully acquiring all of its requiredLocks if
+		 * we wait until it bubbles to the front of another ticketQ. Whereas,
+		 * if we aggressively/greedily awaken it to try just because an item in
+		 * another ticketQ has been removed, we're just introducing contention.
+		 * Both cases have good arguments. The aggressive approach enables us to
+		 * potentially retire more requests, and thus increase throughput.
+		 *
+		 * This current pseudocode assumes the latter and waits for the other
+		 * locks' ticketQs to wake it up when it reaches the top in their
+		 * Qs.
+		 */
+		LockerAndInvoker &newFront = queue.front();
+
+		lock.release();
+
+#ifndef CONFIG_LOCKVOKER_AGGRESSIVE_WAKEUPS
+		if (&newFront != &oldFront)
+#endif
+			{ wakeUp(newFront); }
+	}
+
+public:
+	SpinLock					lock;
+	std::atomic<bool>			isOwned;
+	std::list<LockerAndInvoker>	queue;
+};
+```