Docs: New qutexes design
This commit is contained in:
@@ -0,0 +1,518 @@
|
||||
I just realized that my spinqueueing mechanism is highly power inefficient if a
|
||||
lock needs to be held across a "true async wait"—where the async sequence
|
||||
actually waits on a hardware bottleneck. In this case, the thread acquires the
|
||||
spinlock, then goes to sleep in the kernel schedq until some hardware event
|
||||
occurs, and is then awakened—all while still holding the spinlock.
|
||||
|
||||
Meanwhile, other sequences running on other threads and contending for that lock
|
||||
will be Qspinning. This is acceptable if all I care about is maximum throughput:
|
||||
the Qspinning just re-posts the sequences back into the Q, and eventually
|
||||
they'll acquire the LockSet and proceed.
|
||||
|
||||
Importantly, since the thread itself isn't slept in the kschedQ, it will be
|
||||
deQing and processing other sequences that aren't bottlenecked on the lock held
|
||||
by the sequence waiting for the hardware response. Throughput is indeed
|
||||
maximized.
|
||||
|
||||
However, I just realized that if kernel mutexes expose FD events, I can apply
|
||||
this same logic to sleeplocks: I can wait on the sleeplocks asynchronously
|
||||
instead of synchronously. If I can make my asio::io_service wait on all the
|
||||
mutex FDs requested by sequences on the current thread, then in theory I can put
|
||||
the thread to sleep and know that when the mutex becomes available, I'll be
|
||||
awakened again.
|
||||
|
||||
Hence, I can get the best of both worlds: maximum throughput and power saving.
|
||||
Instead of spinqueueing, we just add the lock FDs to an FD set to be waited on
|
||||
by asio. If any of those locks become available, the kernel scheduler will
|
||||
awaken our asioQ thread, and we can then awaken and retry the lock.
|
||||
|
||||
## Boost asio queue-based sleep locking:
|
||||
|
||||
Instead of using FDs, we can also try to use a fifo Q based mechanism: each lock
|
||||
is a spinlock and a fifo queue.
|
||||
|
||||
Acquire:
|
||||
```
|
||||
lock(spinlock);
|
||||
q.push_back(self);
|
||||
head = q.peek_front();
|
||||
if (head == self) {
|
||||
// We acquired the lock.
|
||||
unlock(spinlock);
|
||||
return;
|
||||
}
|
||||
|
||||
unlock(spinlock);
|
||||
```
|
||||
|
||||
release:
|
||||
```
|
||||
lock(spinlock);
|
||||
// Should get back ourself.
|
||||
q.pop_front();
|
||||
// Wake up the next request in the q.
|
||||
head = q.peek_front();
|
||||
if (head == NULL) {
|
||||
// Nobody was waiting.
|
||||
unlock(spinlock);
|
||||
return;
|
||||
}
|
||||
|
||||
head.thread_to_wake.getIoService().post([]{
|
||||
// This lambda causes thread_to_wake to check this lock's
|
||||
// Q and then proceed to execute since it now owns the lock.
|
||||
});
|
||||
unlock(spinlock);
|
||||
```
|
||||
|
||||
Something like this: it causes the entire thing to be, at least ostensibly,
|
||||
in userspace -- though idk how Boost handles its queues internally.
|
||||
|
||||
## Priortizing LockSets:
|
||||
|
||||
One problem we have with a FIFO-based sleeping system is that it makes it very
|
||||
unlikely that LockSets will ever acquire all of their locks, if there are
|
||||
contenders for those same locks who only need to acquire one of the locks in
|
||||
that LockSet.
|
||||
|
||||
We could theoretically give locksets an advantage by not making them backout
|
||||
if they fail to acquire all locks in their set. I.e: if they get 2/3, then they
|
||||
hold those 2 and then wait for the 3rd. This is problematic because it leaves
|
||||
room open for deadlocks in the form of T1 and T2 needing both LockA and LockB,
|
||||
but they acquire them in reverse order. I.e: T1 takes LockA and now waits for
|
||||
LockB; and T2 takes LockB and now waits for LockA. This will now happen among
|
||||
the LockSets if we don't impose backing out. It may be possible to avoid this
|
||||
using very careful lock ordering and dependency analysis but this project is
|
||||
asynchronous the locking is done in the async sequences and not in the sync
|
||||
accessor functions. So this kind of analysis is almost impossible to do.
|
||||
|
||||
|
||||
We need to think of a way to make the FIFOs biased toward LockSets so that they
|
||||
have an advantage over single-lock acquirers. Or else LockSet sequences will be
|
||||
starved.
|
||||
|
||||
### Timed backoff:
|
||||
|
||||
We could have Locksets be greedy and try to hold on to the locks they've
|
||||
acquired (say, 2/3 and then wait for the 3rd) but then be forced to backoff
|
||||
after a timeout.
|
||||
|
||||
This introduces async event complexity and also the timeout we choose is almost
|
||||
guaranteed to be arbitrary.
|
||||
|
||||
### Fractionally inserted FIFOs:
|
||||
|
||||
We insert sequences with a LockSet.size() of 1, at the back.
|
||||
We insert all other sequences (>1) into first 1/LockSet.size()th position in the
|
||||
Queue.
|
||||
So a Lockset of size 2 will be inserted at the end of the first half of the
|
||||
items in the queue.
|
||||
A Lockset of size 3 will be inserted at the end of the first 33% of items.
|
||||
A lockset of size 4 will be inserted at the end of the first 25% of items.
|
||||
And so on.
|
||||
|
||||
This ensures that higher LockSet.size()s will be prioritized ever higher, and
|
||||
at the same time they don't completely hog everything. Those single-lock
|
||||
sequences that have already naturally progressed past the fraction-mark of a
|
||||
given LockSet size will continue making progress toward the front.
|
||||
|
||||
For queueing sequences with Locksets>1, we can enQ them on the FIFO of the first
|
||||
lock in their set. They'll back off each time anyway, so they'll always be
|
||||
re-trying from the first lock in their set each time.
|
||||
|
||||
#### Impl details:
|
||||
|
||||
We'd like to use std::unordered_set because insertion will require lots of
|
||||
moving items around, but we'll have to use std::vector because we need direct
|
||||
access to insert at arbitrary fractional indexes. It's unlikely the number of
|
||||
items in any lock's Q will ever be large enough to require lots of displacement,
|
||||
but welp there's no reason not to plan for scaling. Although if we end up
|
||||
needing scaling that's a symptom of a bigger problem...with scaling itself lol.
|
||||
There shouldn't be enough items blocked on a lock that we have to design the
|
||||
lock's queue to be scalable.
|
||||
|
||||
### Inverted Fractionally acquired locksets:
|
||||
|
||||
The previous ideas of fractionally inserted lockQs was okay, but the acquisition
|
||||
algo required that the async seq be at the front of a locks queue to
|
||||
successfully acquire that lock. That makes it almost impossible for Locksets>1
|
||||
to ever acquire all of their locks. If we add backoff to that, it basically
|
||||
means no lockset will ever acquire all of its locks.
|
||||
|
||||
Instead what we now do is always insert at the rear (push_back()) and then when
|
||||
acquiring, we check to see if the sequence is in the first
|
||||
1/(1/(LockSet.size())), and if so, it successfully acquires the lock. I.e: if
|
||||
the sequence item isn't in the LAST 1/(LockSet.size()) items, then it succeeds.
|
||||
* For a lockset of size=1: It must be at the front of the queue.
|
||||
* Lockset.size=2: it must be in the first 50% of items.
|
||||
* Lockset.size=3: it must be in the first 66% of items.
|
||||
* Lockset.size=4: It must be in the first 75% of items.
|
||||
|
||||
So this way larger LockSets are favoured, but 1-size locksets make progress.
|
||||
|
||||
For performance:
|
||||
* We obv can just scan the smaller tail percentage for the item instead of
|
||||
scanning the larger front percentage.
|
||||
* If we use a doubly-linked list, we can prolly keep the insertion iterator
|
||||
and this way we won't have to actually find the item in the lockQ when we wish
|
||||
to eventually remove it from the lockQ when releasing the lock.
|
||||
|
||||
## Total overall design:
|
||||
|
||||
### Asio queues and Lockvokers:
|
||||
|
||||
Lockvokers are initially enqueued on a CompThread's queue. When the lockvoker
|
||||
first runs, it checks a flag to see if it has been "registered" into the queues
|
||||
for all locks in its set. If not, then it "registers" itself in each lock's
|
||||
ticketQ and then attempts to acquire each lock. Registration and acquisition
|
||||
are logically separate operations; and locks will often attempt acquisition
|
||||
many times after first registering, without needing to register again. Ideally
|
||||
we can implement a LockSet::registerAndTryAcquireAll() method, but that's for
|
||||
us to think about later.
|
||||
|
||||
```
|
||||
class LockSet
|
||||
{
|
||||
/* Add this either inside of LockSet or outside of it -- depends on whether
|
||||
* it's we can get it to compile because I'm seeing some potential circular
|
||||
* definition dependencies.
|
||||
*/
|
||||
typedef std::pair<TicketLock, LockDesc> LockDesc;
|
||||
|
||||
/* Find a LockDesc -- useful below */
|
||||
LockDesc &getLockDesc(TicketLock &criterionLock)
|
||||
{
|
||||
for (auto &reqLock: requiredLocks) {
|
||||
if (reqLock.first == &criterionLock) { return reqLock; }
|
||||
}
|
||||
|
||||
// Should never happen.
|
||||
throw;
|
||||
}
|
||||
};
|
||||
|
||||
LockSet::register(LockerAndInvoker &lockvoker)
|
||||
{
|
||||
for (auto &lock: lockset.locks) {
|
||||
// Register the Lockvoker object in each lock's ticketQ.
|
||||
lock.second = lock.first.register(lockvoker);
|
||||
}
|
||||
registered = true;
|
||||
}
|
||||
|
||||
bool LockSet::tryAcquire(LockerAndInvoker &lockvoker)
|
||||
{
|
||||
if (!registered) {
|
||||
// Should never happen.
|
||||
throw ...;
|
||||
}
|
||||
int nLocksAcquired=0,
|
||||
nLocksInSet = lockset.size();
|
||||
for (auto &lock: lockset.locks) {
|
||||
if (!lock.first.tryAcquire(nLocksInSet)) {
|
||||
break;
|
||||
}
|
||||
|
||||
nLocksAcquired++;
|
||||
}
|
||||
|
||||
if (nLocksAcquired == nLocksInSet) {
|
||||
// Success
|
||||
return true;
|
||||
}
|
||||
|
||||
for (int i=0; i<nLocksAcquired; i++) {
|
||||
// Backoff does different stuff from release();
|
||||
locks[i].first.backoff(lockvoker);
|
||||
}
|
||||
}
|
||||
|
||||
LockSet::release()
|
||||
{
|
||||
for (auto &lock: requiredLocks) {
|
||||
lock.release();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now, the TicketLock class is what we'll use for synchronization. It's just a
|
||||
combination of 2 atomic<bool> and a std::list.
|
||||
|
||||
```
|
||||
class SpinLock
|
||||
{
|
||||
/* Modify to add methods acquire() and release() which busy-wait.
|
||||
*/
|
||||
void acquire();
|
||||
void release();
|
||||
};
|
||||
|
||||
class LockSet
|
||||
{
|
||||
/* Modify the std::vector of SpinLock to instead be:
|
||||
* std::vector<LockDesc> locks;
|
||||
*/
|
||||
std::vector<LockDesc> locks;
|
||||
}
|
||||
|
||||
class TicketLock
|
||||
{
|
||||
public:
|
||||
std::list<LockerAndInvoker>::iterator register(
|
||||
const LockerAndInvoker &lockvoker
|
||||
)
|
||||
{
|
||||
/** EXPLANATION:
|
||||
* Just insert the lockvoker into the rear of the list.
|
||||
*
|
||||
* Then, since we want to store the
|
||||
*/
|
||||
std::list<LockerAndInvoker>::iterator it;
|
||||
|
||||
lock.acquire();
|
||||
queue.push_back(lockvoker);
|
||||
it = queue.end();
|
||||
--it;
|
||||
lock.release();
|
||||
|
||||
return it;
|
||||
}
|
||||
|
||||
void unregister(std::list<LockerAndInvoker>::iterator it)
|
||||
{
|
||||
lock.acquire();
|
||||
queue.erase(it);
|
||||
lock.release();
|
||||
}
|
||||
|
||||
bool tryAcquire(LockerAndInvoker &tryingLockvoker, int nRequiredLocks)
|
||||
{
|
||||
lock.acquire();
|
||||
qNItems = queue.size();
|
||||
|
||||
if (qNItems < 1) {
|
||||
lock.release();
|
||||
|
||||
/** EXPLANATION:
|
||||
* requiredLocks before ever trying to tryAcquire() them, so if
|
||||
* tryAcquire is being called, that must mean that queue.size() > 0.
|
||||
*
|
||||
* Ergo this should never happen.
|
||||
*/
|
||||
throw;
|
||||
}
|
||||
|
||||
if (isOwned) {
|
||||
lock.release();
|
||||
return false;
|
||||
}
|
||||
|
||||
if (nRequiredLocks == 1) {
|
||||
isOwned.store(true);
|
||||
lock.release();
|
||||
return true;
|
||||
}
|
||||
|
||||
/** EXPLANATION:
|
||||
* From here:
|
||||
* if qNItems == 1 the we are the only one in the ticketQ and we have
|
||||
* successfully acquired the lock.
|
||||
* If qNitems / nRequiredLocks == 0, then we acquire by default since
|
||||
* the number of items in the ticketQ guarantees that we are in the top
|
||||
* X% for that nRequiredLocks.
|
||||
* If qNItems / nRequiredLocks >= 1, then we must do the normal algo:
|
||||
* Check the last (qNItems/nRequiredLocks) items, and if the item isn't
|
||||
* in those items, then it must be in the earlier ones (obviously).
|
||||
* Hence this Lockvoker acquisition should be considered successful.
|
||||
*
|
||||
* EXPLANATION 2:
|
||||
* You'll notice that we don't do actual percentages but rather we just
|
||||
* do discrete fractions -- this makes the algo more deterministic
|
||||
* and much easier to reason about. I.e:
|
||||
* If nRequiredLocks is 6 and qNItems==3:
|
||||
* we don't actually calculate that the Lockvoker item must be in
|
||||
* the top (100-17%), and then try to calculate whether we ought to
|
||||
* consider the 3rd item to be in the last 17-percentile. We just
|
||||
* do a fractional count and assume complete discreteness.
|
||||
*/
|
||||
const int nRearItemsToScan = qNItems / nRequiredLocks;
|
||||
|
||||
if (qNItems == 1 || nRearItemsToScan < 1) {
|
||||
isOwned.store(true);
|
||||
lock.release();
|
||||
return true;
|
||||
}
|
||||
|
||||
auto rIt = queue.rbegin();
|
||||
auto rEndIt = queue.rend();
|
||||
bool foundInRear = false;
|
||||
for (int i=0; i<nRearItemsToScan && rIt != rEndIt; rIt++, i++)
|
||||
{
|
||||
if (&(*rIt) != &tryingLockvoker) { continue; }
|
||||
|
||||
foundInRear == true;
|
||||
break;
|
||||
}
|
||||
|
||||
if (foundInRear) {
|
||||
lock.release();
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Not found in rear: this means the item is in the top X%. That means
|
||||
* it should be allowed to claim the lock.
|
||||
*/
|
||||
isOwned.store(true);
|
||||
lock.release();
|
||||
return true;
|
||||
}
|
||||
|
||||
backoff(LockerAndInvoker &failedAcquirer)
|
||||
{
|
||||
lock.acquire();
|
||||
|
||||
// Rotate queue members if failedAcquirer is at front of queue.
|
||||
LockerAndInvoker &currFront = queue.front();
|
||||
if (currFront == failedAcquirer)
|
||||
{
|
||||
/** EXPLANATION:
|
||||
* Rotate the top LockSet.size() items in the queue by moving
|
||||
* the failedAcquirer to the last position in the top
|
||||
* LockSet.size() items within the queue.
|
||||
*
|
||||
* I.e: if queue.size()==20, and lockSet.size()==5, then move
|
||||
* failedAcquirer from the front the 5th position in the queue,
|
||||
* which should push the other 4 items forward.
|
||||
* If queue.size==3 and LockSet.size()==5, then just
|
||||
* push_back(failedAcquirer).
|
||||
*
|
||||
* It is impossible for a TicketLock queue to have only one
|
||||
* item in it, yet for that Lockvoker item to have failed to
|
||||
* acquire the ticketLock. Being the only item in the ticketQ
|
||||
* means that you must succeed at acquiring the TicketLock.
|
||||
*/
|
||||
int swapPosition = min(
|
||||
queue.size(),
|
||||
failedAcquirer.serializedContinuation.requiredLocks.size());
|
||||
|
||||
/* EXPLANATION:
|
||||
* Swap them here.
|
||||
*
|
||||
* The reason why we do this swap is to avoid a particular kind of
|
||||
* deadlock wherein a grid of async requests is perfectly configured
|
||||
* so as to guarantee that none of them can make any forward
|
||||
* progress unless they get reordered.
|
||||
*
|
||||
* Consider 2 different locks with 2 different items in them
|
||||
* each, both of which come from 2 particular requests:
|
||||
* TicketLock1: Lockvoker1, Lv2
|
||||
* TicketLock2: Lv2, Lv1
|
||||
*
|
||||
* Moreover, both of these lockvokers have requiredLocks.size()==2,
|
||||
* and the particular 2 locks that each one requires are indeed
|
||||
* TicketLock1 and TicketLock2.
|
||||
*
|
||||
* This particular setup basically means that in TL1's queue, Lv1
|
||||
* will wakeup since it's at the front of TL1. It'll successfully
|
||||
* acquire TL1 (since it's at the front), and then it'll try to
|
||||
* acquire TL2. But since Lv1 isn't in the top 50% of items in TL2's
|
||||
* queue, Lv1 will fail to acquire TL2.
|
||||
*
|
||||
* Then similarly, in TL2's queue, Lv2 will wakeup since it's at
|
||||
* the front. Again, it'll successfully acquire TL2 since it's at
|
||||
* the front of TL2's queue. But then it'll try to acquire TL1.
|
||||
* Since it's not in the top 50% of TL1's enqueued items, it'll fail
|
||||
* to acquire TL1.
|
||||
*
|
||||
* N.B: This type of perfectly ordered deadlock can occur in any
|
||||
* kind of NxN situation where ticketQ.size()==requiredLocks.size().
|
||||
* That could be 4x4, 5x5, 6x6, etc. It doesn't happen in 1x1
|
||||
* because a Lockvoker that only requires one lock will always just
|
||||
* succeed if it's at the front of its queue.
|
||||
*
|
||||
* This state of affairs is stable and will persist unless these
|
||||
* queues are reordered in some way. Hence: that's why we rotate the
|
||||
* items in a TicketLockQ after backing off of it. Backing off means
|
||||
* Not necessarily that the calling LockVoker failed to acquire
|
||||
* THIS PARTICULAR TicketLock, but rather than it failed to acquire
|
||||
* ALL of its required locks.
|
||||
*
|
||||
* Hence, if we are backing out, we should also rotate the items
|
||||
* in the queue if the current front item is the failed acquirer.
|
||||
* So that's why we do this rotation here.
|
||||
*/
|
||||
}
|
||||
|
||||
LockerAndInvoker &newFront = queue.front();
|
||||
|
||||
lock.release();
|
||||
|
||||
wakeUp(newFront);
|
||||
}
|
||||
|
||||
void release(LockerAndInvoker &prevOwner)
|
||||
{
|
||||
lock.acquire();
|
||||
|
||||
#ifndef CONFIG_LOCKVOKER_AGGRESSIVE_WAKEUPS
|
||||
LockerAndInvoker &oldFront = queue.front();
|
||||
#endif
|
||||
|
||||
unregister(prevOwner.serializedContinuation.requiredLocks.getDesc(
|
||||
*this).second);
|
||||
|
||||
/** NOTE:
|
||||
* I am not sure whether we should only wake up the front item if
|
||||
* the prev owner was the previous front item; or whether we should
|
||||
* always wake up the new front item.
|
||||
*
|
||||
* Recall that because a sequence can acquire a ticketLock without being
|
||||
* at the front of the queue (because it could merely be in the top X%
|
||||
* of items instead), this means that during a call to release(), the
|
||||
* owning async sequence may not be at the front -- it needs only be in
|
||||
* the top X% of items.
|
||||
*
|
||||
* When the owning sequence is the front, then we should definitely wake
|
||||
* the new front item after removing the previous owner.
|
||||
*
|
||||
* But when the front item is not the prev owner, then should we wake it
|
||||
* up? It should have been awakened previously, so if it's still at the
|
||||
* front, that implies that it failed to make forward progress last time
|
||||
* it awoke. You could argue that during the interim while this owner
|
||||
* did its thing, it's possible for the current front item to have
|
||||
* become capable of acquiring all of its requiredLocks, due to changes
|
||||
* in the other locks in its requiredLocks set. This is a fair argument.
|
||||
*
|
||||
* You could also argue that we can just wait until its registered
|
||||
* items in its other locks' ticketQs reach the front of those other
|
||||
* ticketQs, and then when that happens, those locks will wake it up
|
||||
* when release() is called.
|
||||
*
|
||||
* The latter eases contention since one could argue that we have a
|
||||
* surer chance of it successfully acquiring all of its requiredLocks if
|
||||
* we wait until it bubbles to the front of another ticketQ. Whereas,
|
||||
* if we aggressively/greedily awaken it to try just because an item in
|
||||
* another ticketQ has been removed, we're just introducing contention.
|
||||
* Both cases have good arguments. The aggressive approach enables us to
|
||||
* potentially retire more requests, and thus increase throughput.
|
||||
*
|
||||
* This current pseudocode assumes the latter and waits for the other
|
||||
* locks' ticketQs to wake it up when it reaches the top in their
|
||||
* Qs.
|
||||
*/
|
||||
LockerAndInvoker &newFront = queue.front();
|
||||
|
||||
lock.release();
|
||||
|
||||
#ifndef CONFIG_LOCKVOKER_AGGRESSIVE_WAKEUPS
|
||||
if (&newFront != &oldFront)
|
||||
#endif
|
||||
{ wakeUp(newFront); }
|
||||
}
|
||||
|
||||
public:
|
||||
SpinLock lock;
|
||||
std::atomic<bool> isOwned;
|
||||
std::list<LockerAndInvoker> queue;
|
||||
};
|
||||
```
|
||||
Reference in New Issue
Block a user