Docs: Document the locking mechanism we plan to use

This new locking mechanism is very cumbersome, but highly throughput maximizing. It trades high memory usage to gain high throughput. We may end up even being able to get the high throughput without incurring the high memory usage by using std::bind objects, etc.
2025-09-10 18:14:20 -04:00
parent e08dc0678b
commit b7cf4c9135
2 changed files with 59 additions and 282 deletions
@@ -1,282 +0,0 @@
-# Adaptive Resource Acquisition with Re-queuing
-
-## Overview
-
-This document describes a novel synchronization pattern that combines the benefits of spinlocks, mutexes, and queuing systems while avoiding their respective drawbacks. The pattern is designed for high-throughput async systems where multiple threads need to coordinate access to shared resources without blocking or wasting CPU cycles.
-
-## Problem Statement
-
-Traditional synchronization mechanisms have significant trade-offs that limit system performance:
-
- **Mutexes**: Block threads, causing context switches and reduced throughput
- **Spinlocks**: Waste CPU cycles while waiting, preventing other work from proceeding
- **Pure Queuing**: Serializes all operations, reducing parallelism unnecessarily
-
-The challenge is to maintain data consistency across multi-segment async operations while maximizing system throughput. In high-performance systems, the overhead of context switching can be substantial, and CPU cycles are precious resources that should not be wasted on busy-waiting.
-
-## Core Concept
-
-The Adaptive Resource Acquisition pattern uses **atomic flags on shared objects** combined with **immediate re-queuing** to achieve optimal performance characteristics:
-
-1. **No thread blocking** - Threads never sleep or context switch, maintaining maximum responsiveness
-2. **No CPU waste** - No busy-waiting when other work could proceed, ensuring efficient resource utilization
-3. **Maximum throughput** - Threads always process available work, maximizing system productivity
-4. **Data consistency** - Atomic resource acquisition preserves integrity without traditional locking overhead
-
-This approach fundamentally changes how we think about resource coordination, treating it as a flow management problem rather than a blocking synchronization problem.
-
-## Architecture
-
-### Resource Objects
-
-Each shared object that requires synchronization carries an atomic flag that indicates its availability. This flag serves as the primary coordination mechanism, allowing threads to atomically claim ownership without the overhead of traditional locks.
-
-The resource object structure is intentionally simple, containing only the essential coordination mechanism and the resource-specific data. This minimalism reduces memory overhead and improves cache locality.
-
-### Request Structure
-
-Async operations are encapsulated as requests that specify their resource requirements and the operation to be performed. This encapsulation allows the system to reason about resource dependencies before attempting execution, enabling intelligent scheduling decisions.
-
-The request structure includes metadata such as priority levels, which can be used for advanced scheduling policies. This flexibility allows the system to adapt to different workload characteristics and business requirements.
-
-### Resource Manager
-
-The core component orchestrates resource acquisition and request processing through a sophisticated coordination mechanism. It maintains a registry of all available resources and manages the flow of requests through the system.
-
-The resource manager operates on a simple principle: attempt to acquire all required resources atomically, and if successful, execute the operation immediately. If any resource is unavailable, the request is immediately re-queued for later processing without any blocking or waiting.
-
-## Algorithm
-
-### Resource Acquisition Process
-
-The resource acquisition process follows a simple but effective strategy. For each request, the system attempts to atomically acquire all required resources in a single pass. This atomicity is crucial for maintaining data consistency and preventing race conditions.
-
-If all resources can be acquired atomically, the operation proceeds immediately. This represents the optimal case where no coordination overhead is incurred beyond the atomic operations themselves. The system achieves maximum throughput in this scenario.
-
-If any resource cannot be acquired, the system immediately releases any resources that were successfully acquired and re-queues the request. This approach ensures that resources are never held unnecessarily and that the system can continue processing other requests without delay.
-
-The key insight is that failed acquisition attempts are not failures in the traditional sense, but rather normal flow control mechanisms. The system treats resource contention as a scheduling opportunity rather than a blocking condition.
-
-#### Atomic Resource Acquisition Pseudocode
-
-```
-TRY_ACQUIRE_RESOURCES(resource_names):
-    acquired_resources = []
-    
-    FOR EACH resource_name IN resource_names:
-        resource = GET_RESOURCE(resource_name)
-        expected_value = false
-        desired_value = true
-        
-        // Atomic compare-and-swap operation
-        IF ATOMIC_COMPARE_EXCHANGE_STRONG(resource.flag, expected_value, desired_value):
-            // Successfully acquired this resource
-            acquired_resources.ADD(resource)
-        ELSE:
-            // Failed to acquire this resource
-            // Release all previously acquired resources
-            FOR EACH acquired_resource IN acquired_resources:
-                ATOMIC_STORE(acquired_resource.flag, false)
-            RETURN false
-    
-    // Successfully acquired all resources
-    RETURN true
-```
-
-### Request Processing Workflow
-
-The request processing workflow is designed for maximum efficiency. Each request is processed exactly once per cycle, either by successful execution or by re-queuing for later processing.
-
-When a request is successfully processed, the system immediately releases all acquired resources, making them available for other requests. This rapid resource turnover maximizes system throughput and minimizes resource contention.
-
-The re-queuing mechanism ensures that no request is lost, while the immediate nature of the re-queuing prevents any blocking or waiting. Requests that cannot be processed immediately simply wait their turn in the queue, allowing other requests to proceed without interference.
-
-#### Basic Processing Algorithm
-
-```
-PROCESS_REQUEST(request):
-    // Step 1: Dequeue the request
-    request = DEQUEUE_FROM_QUEUE()
-    
-    // Step 2: Attempt atomic resource acquisition
-    resources_acquired = []
-    acquisition_successful = true
-    
-    FOR EACH resource_name IN request.required_resources:
-        resource = GET_RESOURCE(resource_name)
-        IF ATOMIC_COMPARE_EXCHANGE(resource.flag, false, true):
-            resources_acquired.ADD(resource)
-        ELSE:
-            acquisition_successful = false
-            BREAK
-    
-    // Step 3: Handle acquisition result
-    IF acquisition_successful:
-        // Execute the operation
-        EXECUTE_OPERATION(request.operation)
-        
-        // Release all acquired resources
-        FOR EACH resource IN resources_acquired:
-            ATOMIC_STORE(resource.flag, false)
-    ELSE:
-        // Release any partially acquired resources
-        FOR EACH resource IN resources_acquired:
-            ATOMIC_STORE(resource.flag, false)
-        
-        // Re-queue the request for later processing
-        ENQUEUE_REQUEST(request)
-```
-
-### Event Loop Management
-
-The event loop continuously processes requests from the queue until no more requests are available. This simple loop structure ensures that the system is always making progress on available work.
-
-The loop processes requests in the order they were queued, providing a natural fairness mechanism. However, the system can be extended with priority queuing or other scheduling policies to meet specific requirements.
-
-The event loop is designed to be efficient and non-blocking, ensuring that the system remains responsive even under high load conditions.
-
-#### Main Event Loop Pseudocode
-
-```
-MAIN_EVENT_LOOP():
-    WHILE true:
-        // Check if there are requests to process
-        IF QUEUE_IS_EMPTY():
-            BREAK
-        
-        // Dequeue the next request
-        request = DEQUEUE_FROM_QUEUE()
-        
-        // Process the request (this includes re-queuing if needed)
-        PROCESS_REQUEST(request)
-        
-        // Continue with next request
-        CONTINUE
-```
-
-#### Multi-threaded Worker Loop
-
-```
-WORKER_THREAD():
-    WHILE true:
-        // Wait for work to become available
-        request = WAIT_FOR_REQUEST()
-        
-        // Process the request
-        PROCESS_REQUEST(request)
-        
-        // Return to waiting state
-        CONTINUE
-```
-
-## Multi-Threaded Implementation
-
-### Thread-Safe Coordination
-
-In a multi-threaded environment, the resource manager must coordinate access to its internal data structures while maintaining the non-blocking characteristics of the pattern. This coordination is achieved through careful use of atomic operations and minimal locking.
-
-The queue management uses traditional mutex-based synchronization, but only for the queue operations themselves. The critical resource acquisition path remains lock-free, ensuring that the performance benefits of the pattern are preserved.
-
-Worker threads continuously process requests from the shared queue, attempting to acquire resources and execute operations. The coordination between threads is handled implicitly through the atomic resource flags, eliminating the need for explicit thread synchronization in the critical path.
-
-### Worker Thread Behavior
-
-Worker threads operate in a continuous loop, processing requests as they become available. Each thread independently attempts to acquire resources and execute operations, creating natural parallelism without explicit coordination.
-
-The worker threads are designed to be lightweight and efficient, with minimal overhead beyond the actual resource acquisition and operation execution. This design allows the system to scale effectively with the number of available CPU cores.
-
-The thread coordination is handled through the shared queue and atomic resource flags, creating a self-balancing system that naturally distributes work across available threads.
-
-## Use Cases
-
-### Device Management Systems
-
-In device management systems, multiple operations may need to coordinate access to physical or logical devices. The adaptive resource acquisition pattern provides an elegant solution for managing these complex coordination requirements.
-
-For example, when attaching a device, the system may need to coordinate access to the device itself, the device registry, and various system resources. The pattern allows these operations to proceed atomically when resources are available, while gracefully handling contention through re-queuing.
-
-The device management system can handle complex multi-step operations that require coordination across multiple resources, all while maintaining high throughput and responsiveness.
-
-### Database Connection Pools
-
-Database connection pools are a natural fit for the adaptive resource acquisition pattern. Each database operation requires access to a connection from the pool, and the pattern provides efficient coordination without the overhead of traditional locking.
-
-The pattern allows the system to process multiple database operations concurrently when connections are available, while gracefully handling periods of high contention. The re-queuing mechanism ensures that no operations are lost, even during peak load periods.
-
-The connection pool can implement sophisticated scheduling policies, such as priority queuing for different types of operations, while maintaining the performance benefits of the pattern.
-
-## Performance Characteristics
-
-### Throughput Analysis
-
-The performance characteristics of the adaptive resource acquisition pattern are determined by the resource contention patterns in the system. In the best case, when resources are readily available, the system achieves maximum throughput with minimal overhead.
-
-In the worst case, when resources are heavily contended, the system gracefully degrades to a queuing behavior, ensuring that all operations eventually complete. The system maintains fairness and prevents starvation through the natural ordering of the queue.
-
-The average case performance represents the typical operating conditions, where the system achieves optimal parallelism while handling occasional resource contention through re-queuing.
-
-### Comparison with Traditional Methods
-
-The adaptive resource acquisition pattern provides a unique combination of performance characteristics that are not achievable with traditional synchronization mechanisms:
-
- **Mutexes** provide data consistency but at the cost of thread blocking and context switching overhead
- **Spinlocks** avoid context switching but waste CPU cycles during contention
- **Pure queuing** avoids both blocking and CPU waste but serializes operations unnecessarily
-
-The adaptive pattern combines the best aspects of these approaches while avoiding their drawbacks, creating a solution that is both efficient and practical.
-
-## Advanced Features
-
-### Priority Queuing
-
-The system can be extended with priority queuing to handle different types of operations with varying importance. High-priority operations can be processed before lower-priority operations, ensuring that critical operations receive timely attention.
-
-The priority queuing mechanism integrates seamlessly with the existing re-queuing behavior, allowing the system to maintain its performance characteristics while providing sophisticated scheduling capabilities.
-
-### Resource Groups
-
-Complex operations may require coordination across multiple related resources. Resource groups allow the system to treat related resources as a single unit for acquisition purposes, simplifying the coordination logic for complex operations.
-
-Resource groups can be used to implement sophisticated resource management policies, such as ensuring that related resources are always acquired together or implementing resource reservation mechanisms.
-
-### Fairness Mechanisms
-
-The system can implement various fairness mechanisms to ensure that all requests receive fair treatment over time. Round-robin processing, aging mechanisms, and other fairness policies can be implemented while maintaining the performance benefits of the pattern.
-
-Fairness mechanisms are particularly important in systems where different types of operations have different resource requirements, ensuring that no operation type dominates the system resources.
-
-## Implementation Considerations
-
-### Memory Management
-
-The pattern requires careful attention to memory management, particularly for the request objects and resource metadata. Smart pointers and object pooling can be used to minimize memory allocation overhead and improve performance.
-
-The system should implement proper cleanup mechanisms for failed operations and ensure that resources are always released, even in error conditions.
-
-### Error Handling
-
-Robust error handling is essential for maintaining system reliability. The system should gracefully handle operation failures, resource unavailability, and other error conditions without affecting the overall system performance.
-
-Retry mechanisms with exponential backoff can be implemented for transient failures, while deadlock detection and resolution mechanisms can handle more complex failure scenarios.
-
-### Monitoring and Debugging
-
-The system should provide comprehensive monitoring capabilities to track performance metrics, resource utilization, and queue behavior. These metrics are essential for tuning the system and identifying performance bottlenecks.
-
-Debugging support should include detailed logging of resource acquisition attempts, queue operations, and operation execution, allowing developers to understand and optimize system behavior.
-
-## Conclusion
-
-The Adaptive Resource Acquisition pattern provides a novel solution to the classic synchronization dilemma. By combining atomic operations with intelligent re-queuing, it achieves maximum throughput while maintaining data consistency and avoiding the overhead of traditional synchronization mechanisms.
-
-This pattern is particularly well-suited for high-performance async systems where traditional synchronization mechanisms would create unacceptable overhead. The pattern's simplicity and effectiveness make it a valuable addition to the toolkit of concurrent programming patterns.
-
-The pattern represents a fundamental shift in how we think about resource coordination, treating it as a flow management problem rather than a blocking synchronization problem. This shift enables new levels of performance and scalability in concurrent systems.
-
-The adaptive resource acquisition pattern is particularly valuable in:
- High-performance async systems where throughput is critical
- Resource-constrained environments where CPU cycles are precious
- Systems requiring predictable latency and responsiveness
- Multi-threaded applications with complex shared state requirements
-
-By providing a practical solution to the synchronization dilemma, this pattern enables developers to build high-performance concurrent systems without sacrificing simplicity or reliability.
@@ -0,0 +1,59 @@
+# Spinqueueing: A new locking method that only blocks requests and not threads.
+
+The idea is that instead of using sleeplocks like mutexes, we instead only spin
+particular request objects by re-posting them to the queue.
+
+Particular requests may need a given shared resource. Instead of sleeping a
+whole thread while that particular request waits for the resource, we instead
+sleep the request itself by re-posting it into the thread's queue. This
+basically implements a kind of spinlock without busy-waiting. The underlying
+thread is never blocked unless it has no requests that can make forward
+progress.
+
+Forward progress through requests is only halted when an external resource is
+actually being waited on. Generally this will be an actual hardware event that
+is being waited on. No software bottlenecks will be slept on.
+
+All locks in the program are simple spinlocks, but the algorithm to spin on them
+is:
+
+## Each async call has a "locker and invoker":
+
+int funcThatCallsAnAsyncFunc(...)
+{
+	// Do preparatory stuff ...
+
+
+	// Post the lockvoker to the target thread.
+	targetThread.io_service.post(
+		[targetThread, /* args to asyncOperationReq captured here */]()
+		{
+			int nAcquired;
+			for (nAcquired=0; nAcquired<nLocksRequired; nAcquired++)
+			{
+				if (!requiredLocks->tryAcquire()) {
+					break;
+				}
+			}
+			if (nAcquired < nLocksRequired)
+			{
+				for (int i=0; i<nAcquired; i++) {
+					requiredLocks->release();
+				}
+
+				/* Unsure how to recapture the lambda object and re-enqueue it.
+				 * Dunno if that's even possible. But this is the essence of the
+				 * queue-spin system. We re-enqueue the lockvoker until it
+				 * gets all locks required. Then it will invoke the async
+				 * frontend.
+				 */
+				targetThead.io_service.post(this?);
+			}
+
+			managerObject.asyncOperationReq(
+				/* args to asyncOperationReq passed here */);
+		}
+	);
+}
+
+## Idk how to encapsulate lockvokers into a terse, reusable idiom.