diff --git a/docs/design/qutexes.md b/docs/design/qutexes.md new file mode 100644 index 0000000..c0551d5 --- /dev/null +++ b/docs/design/qutexes.md @@ -0,0 +1,518 @@ +I just realized that my spinqueueing mechanism is highly power inefficient if a +lock needs to be held across a "true async wait"—where the async sequence +actually waits on a hardware bottleneck. In this case, the thread acquires the +spinlock, then goes to sleep in the kernel schedq until some hardware event +occurs, and is then awakened—all while still holding the spinlock. + +Meanwhile, other sequences running on other threads and contending for that lock +will be Qspinning. This is acceptable if all I care about is maximum throughput: +the Qspinning just re-posts the sequences back into the Q, and eventually +they'll acquire the LockSet and proceed. + +Importantly, since the thread itself isn't slept in the kschedQ, it will be +deQing and processing other sequences that aren't bottlenecked on the lock held +by the sequence waiting for the hardware response. Throughput is indeed +maximized. + +However, I just realized that if kernel mutexes expose FD events, I can apply +this same logic to sleeplocks: I can wait on the sleeplocks asynchronously +instead of synchronously. If I can make my asio::io_service wait on all the +mutex FDs requested by sequences on the current thread, then in theory I can put +the thread to sleep and know that when the mutex becomes available, I'll be +awakened again. + +Hence, I can get the best of both worlds: maximum throughput and power saving. +Instead of spinqueueing, we just add the lock FDs to an FD set to be waited on +by asio. If any of those locks become available, the kernel scheduler will +awaken our asioQ thread, and we can then awaken and retry the lock. + +## Boost asio queue-based sleep locking: + +Instead of using FDs, we can also try to use a fifo Q based mechanism: each lock +is a spinlock and a fifo queue. + +Acquire: +``` +lock(spinlock); +q.push_back(self); +head = q.peek_front(); +if (head == self) { + // We acquired the lock. + unlock(spinlock); + return; +} + +unlock(spinlock); +``` + +release: +``` +lock(spinlock); +// Should get back ourself. +q.pop_front(); +// Wake up the next request in the q. +head = q.peek_front(); +if (head == NULL) { + // Nobody was waiting. + unlock(spinlock); + return; +} + +head.thread_to_wake.getIoService().post([]{ + // This lambda causes thread_to_wake to check this lock's + // Q and then proceed to execute since it now owns the lock. +}); +unlock(spinlock); +``` + +Something like this: it causes the entire thing to be, at least ostensibly, +in userspace -- though idk how Boost handles its queues internally. + +## Priortizing LockSets: + +One problem we have with a FIFO-based sleeping system is that it makes it very +unlikely that LockSets will ever acquire all of their locks, if there are +contenders for those same locks who only need to acquire one of the locks in +that LockSet. + +We could theoretically give locksets an advantage by not making them backout +if they fail to acquire all locks in their set. I.e: if they get 2/3, then they +hold those 2 and then wait for the 3rd. This is problematic because it leaves +room open for deadlocks in the form of T1 and T2 needing both LockA and LockB, +but they acquire them in reverse order. I.e: T1 takes LockA and now waits for +LockB; and T2 takes LockB and now waits for LockA. This will now happen among +the LockSets if we don't impose backing out. It may be possible to avoid this +using very careful lock ordering and dependency analysis but this project is +asynchronous the locking is done in the async sequences and not in the sync +accessor functions. So this kind of analysis is almost impossible to do. + + +We need to think of a way to make the FIFOs biased toward LockSets so that they +have an advantage over single-lock acquirers. Or else LockSet sequences will be +starved. + +### Timed backoff: + +We could have Locksets be greedy and try to hold on to the locks they've +acquired (say, 2/3 and then wait for the 3rd) but then be forced to backoff +after a timeout. + +This introduces async event complexity and also the timeout we choose is almost +guaranteed to be arbitrary. + +### Fractionally inserted FIFOs: + +We insert sequences with a LockSet.size() of 1, at the back. +We insert all other sequences (>1) into first 1/LockSet.size()th position in the +Queue. +So a Lockset of size 2 will be inserted at the end of the first half of the +items in the queue. +A Lockset of size 3 will be inserted at the end of the first 33% of items. +A lockset of size 4 will be inserted at the end of the first 25% of items. +And so on. + +This ensures that higher LockSet.size()s will be prioritized ever higher, and +at the same time they don't completely hog everything. Those single-lock +sequences that have already naturally progressed past the fraction-mark of a +given LockSet size will continue making progress toward the front. + +For queueing sequences with Locksets>1, we can enQ them on the FIFO of the first +lock in their set. They'll back off each time anyway, so they'll always be +re-trying from the first lock in their set each time. + +#### Impl details: + +We'd like to use std::unordered_set because insertion will require lots of +moving items around, but we'll have to use std::vector because we need direct +access to insert at arbitrary fractional indexes. It's unlikely the number of +items in any lock's Q will ever be large enough to require lots of displacement, +but welp there's no reason not to plan for scaling. Although if we end up +needing scaling that's a symptom of a bigger problem...with scaling itself lol. +There shouldn't be enough items blocked on a lock that we have to design the +lock's queue to be scalable. + +### Inverted Fractionally acquired locksets: + +The previous ideas of fractionally inserted lockQs was okay, but the acquisition +algo required that the async seq be at the front of a locks queue to +successfully acquire that lock. That makes it almost impossible for Locksets>1 +to ever acquire all of their locks. If we add backoff to that, it basically +means no lockset will ever acquire all of its locks. + +Instead what we now do is always insert at the rear (push_back()) and then when +acquiring, we check to see if the sequence is in the first +1/(1/(LockSet.size())), and if so, it successfully acquires the lock. I.e: if +the sequence item isn't in the LAST 1/(LockSet.size()) items, then it succeeds. +* For a lockset of size=1: It must be at the front of the queue. +* Lockset.size=2: it must be in the first 50% of items. +* Lockset.size=3: it must be in the first 66% of items. +* Lockset.size=4: It must be in the first 75% of items. + +So this way larger LockSets are favoured, but 1-size locksets make progress. + +For performance: +* We obv can just scan the smaller tail percentage for the item instead of + scanning the larger front percentage. +* If we use a doubly-linked list, we can prolly keep the insertion iterator + and this way we won't have to actually find the item in the lockQ when we wish + to eventually remove it from the lockQ when releasing the lock. + +## Total overall design: + +### Asio queues and Lockvokers: + +Lockvokers are initially enqueued on a CompThread's queue. When the lockvoker +first runs, it checks a flag to see if it has been "registered" into the queues +for all locks in its set. If not, then it "registers" itself in each lock's +ticketQ and then attempts to acquire each lock. Registration and acquisition +are logically separate operations; and locks will often attempt acquisition +many times after first registering, without needing to register again. Ideally +we can implement a LockSet::registerAndTryAcquireAll() method, but that's for +us to think about later. + +``` +class LockSet +{ + /* Add this either inside of LockSet or outside of it -- depends on whether + * it's we can get it to compile because I'm seeing some potential circular + * definition dependencies. + */ + typedef std::pair LockDesc; + + /* Find a LockDesc -- useful below */ + LockDesc &getLockDesc(TicketLock &criterionLock) + { + for (auto &reqLock: requiredLocks) { + if (reqLock.first == &criterionLock) { return reqLock; } + } + + // Should never happen. + throw; + } +}; + +LockSet::register(LockerAndInvoker &lockvoker) +{ + for (auto &lock: lockset.locks) { + // Register the Lockvoker object in each lock's ticketQ. + lock.second = lock.first.register(lockvoker); + } + registered = true; +} + +bool LockSet::tryAcquire(LockerAndInvoker &lockvoker) +{ + if (!registered) { + // Should never happen. + throw ...; + } + int nLocksAcquired=0, + nLocksInSet = lockset.size(); + for (auto &lock: lockset.locks) { + if (!lock.first.tryAcquire(nLocksInSet)) { + break; + } + + nLocksAcquired++; + } + + if (nLocksAcquired == nLocksInSet) { + // Success + return true; + } + + for (int i=0; i and a std::list. + +``` +class SpinLock +{ + /* Modify to add methods acquire() and release() which busy-wait. + */ + void acquire(); + void release(); +}; + +class LockSet +{ + /* Modify the std::vector of SpinLock to instead be: + * std::vector locks; + */ + std::vector locks; +} + +class TicketLock +{ +public: + std::list::iterator register( + const LockerAndInvoker &lockvoker + ) + { + /** EXPLANATION: + * Just insert the lockvoker into the rear of the list. + * + * Then, since we want to store the + */ + std::list::iterator it; + + lock.acquire(); + queue.push_back(lockvoker); + it = queue.end(); + --it; + lock.release(); + + return it; + } + + void unregister(std::list::iterator it) + { + lock.acquire(); + queue.erase(it); + lock.release(); + } + + bool tryAcquire(LockerAndInvoker &tryingLockvoker, int nRequiredLocks) + { + lock.acquire(); + qNItems = queue.size(); + + if (qNItems < 1) { + lock.release(); + + /** EXPLANATION: + * requiredLocks before ever trying to tryAcquire() them, so if + * tryAcquire is being called, that must mean that queue.size() > 0. + * + * Ergo this should never happen. + */ + throw; + } + + if (isOwned) { + lock.release(); + return false; + } + + if (nRequiredLocks == 1) { + isOwned.store(true); + lock.release(); + return true; + } + + /** EXPLANATION: + * From here: + * if qNItems == 1 the we are the only one in the ticketQ and we have + * successfully acquired the lock. + * If qNitems / nRequiredLocks == 0, then we acquire by default since + * the number of items in the ticketQ guarantees that we are in the top + * X% for that nRequiredLocks. + * If qNItems / nRequiredLocks >= 1, then we must do the normal algo: + * Check the last (qNItems/nRequiredLocks) items, and if the item isn't + * in those items, then it must be in the earlier ones (obviously). + * Hence this Lockvoker acquisition should be considered successful. + * + * EXPLANATION 2: + * You'll notice that we don't do actual percentages but rather we just + * do discrete fractions -- this makes the algo more deterministic + * and much easier to reason about. I.e: + * If nRequiredLocks is 6 and qNItems==3: + * we don't actually calculate that the Lockvoker item must be in + * the top (100-17%), and then try to calculate whether we ought to + * consider the 3rd item to be in the last 17-percentile. We just + * do a fractional count and assume complete discreteness. + */ + const int nRearItemsToScan = qNItems / nRequiredLocks; + + if (qNItems == 1 || nRearItemsToScan < 1) { + isOwned.store(true); + lock.release(); + return true; + } + + auto rIt = queue.rbegin(); + auto rEndIt = queue.rend(); + bool foundInRear = false; + for (int i=0; i isOwned; + std::list queue; +}; +``` \ No newline at end of file