diff --git a/scripts/udpCommandDemuxer/gdb-dump-segfault.txt b/scripts/udpCommandDemuxer/gdb-dump-segfault.txt new file mode 100644 index 0000000..be178f4 --- /dev/null +++ b/scripts/udpCommandDemuxer/gdb-dump-segfault.txt @@ -0,0 +1,171 @@ +========================================== +Iteration 67 - Thu Oct 30 08:41:13 PM AST 2025 +========================================== + +GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git +Copyright (C) 2024 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +Type "show copying" and "show warranty" for details. +This GDB was configured as "x86_64-linux-gnu". +Type "show configuration" for configuration details. +For bug reporting instructions, please see: +. +Find the GDB manual and other documentation resources online at: + . + +For help, type "help". +Type "apropos word" to search for commands related to "word"... +Reading symbols from salmanoff... +SIGINT is used by the debugger. +Are you sure you want to change it? (y or n) [answered Y; input not from terminal] +Waiting 2281ms before sending SIGINT... +Starting program... + +This GDB supports auto-downloading debuginfo from the following URLs: + +Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal] +Debuginfod has been disabled. +To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". +[New Thread 0x7ffff77ff6c0 (LWP 805031)] +CRT:main: about to JOLT Mrntt with cmdline args +main: Waiting for command line JOLT +Mrntt:operator():JOLTED: setting cmdline args +main: salmanoff 0.01.000 +main: DAP Specs: +DAP Spec Files: devices/bodies/dell-laptop.daps +Stim Buff API Library Paths: commonLibs/livoxProto1/ commonLibs/xcbXorg/ stimBuffApis/xcbWindow/ stimBuffApis/livoxGen1/ +Stim Buff API Libraries: libxcbWindow.so liblivoxGen1.so + +initializeSalmanoff: Entered. + +main: Entering event loop +[New Thread 0x7ffff6ffe6c0 (LWP 805032)] +[New Thread 0x7ffff67fd6c0 (LWP 805033)] +[New Thread 0x7ffff5ffc6c0 (LWP 805034)] +[New Thread 0x7ffff57fb6c0 (LWP 805035)] +[New Thread 0x7ffff4ffa6c0 (LWP 805036)] +distributeAndPinThreadsAcrossCpus: Distributed 5 threads across 4 CPUs +joltThreadReq1_posted: Thread 'director': handling JOLT request. +joltThreadReq1_posted: Thread 'simulator': handling JOLT request. +joltThreadReq1_posted: Thread 'subconscious': handling JOLT request. +joltThreadReq1_posted: Thread 'body': handling JOLT request. +body:main: Entering event loop +simulator:main: Entering event loop +subconscious:main: Entering event loop +joltThreadReq1_posted: Thread 'world': handling JOLT request. +world:main: Entering event loop +Mrntt: All mind threads JOLTed. +director:main: Entering event loop +startThreadReq1_posted: Thread 'director': handling startThread. +startThreadReq1_posted: Thread 'body': handling startThread. +startThreadReq1_posted: Thread 'simulator': handling startThread. +startThreadReq1_posted: Thread 'world': handling startThread. +startThreadReq1_posted: Thread 'subconscious': handling startThread. +Mrntt: All mind threads started. +Library Path: libxcbWindow.so +Stim Buff API Descriptor: Name: xcb +Exported QualeIface APIs: + - visual-qualeiface + + +Library Path: liblivoxGen1.so +Stim Buff API Descriptor: Name: livoxGen1 +Exported QualeIface APIs: + - pcloud + - pcloudIntensity + - gyro + - accel + + + +start: BroadcastListener started on port 55000 +start: UDP Command Demuxer started on port 56001 +attachStimBuffDeviceReq1_posted: Attaching edev win0 to world thread +xcbWindow_attachDeviceReq: Attached X11 window: + Display: 1, Screen: 0, MatchType: substring, Target: "mut", Found: "mutter guard window" (matched substring 'mut') +attachStimBuffDeviceReq1_posted: Attaching edev avia0 to world thread +getOrCreateDeviceReq1: Connection failed for device 3JEDK380010Z39 +attachDeviceReq1: Failed to create/find Livox device: 3JEDK380010Z39 +newDeviceAttachmentSpecInd2: Attach failed for device spec Device Identifier: avia0, Sensor Type: e, QualeIface API: structural-qualeiface, StimBuff API: livoxGen1, StimBuff API Params: (), Provider: livoxProto1, Provider Params: (), Device Selector: 3JEDK380010Z39 + +attachAllUnattachedDevicesFromReq2: Failed to attach device: avia0 +Mrntt: attached 1 of 2 sense devices. +Mrntt: Body component initialized. +negtrinEventInd: Handling negtrin event. +marionetteInitializeReqCb: Marionette initialized. +broadcastMsgInd: Discovered new Livox device: DiscoveredDevice{identifier='3JEDK380010Z391', ipAddr='10.42.0.139', deviceType=7 (Avia)} +attachStimBuffDeviceReq1_posted: Attaching edev avia0 to world thread +attachDeviceReq1: Successfully attached/found Livox device: 3JEDK380010Z39 (ID: avia0) +Sending SIGINT to program (PID: 805028)... +SIGINT (Ctrl+C) received. Initiating shutdown... +Mrntt: About to detach all sense devices. +xcbWindow_detachDeviceReq: Detached X11 window device: +Device Identifier: win0, Sensor Type: e, QualeIface API: visual-qualeiface, StimBuff API: xcb, StimBuff API Params: (dev-substring ), Provider: xorg, Provider Params: (display=1 screen=0 ), Device Selector: mut + +Mrntt: Successfully detached 1 of 1 sense devices. +Mrntt: About to finalize all stim buff api libs. +stop: UDP Command Demuxer stopped +stop: BroadcastListener stopped +broadcastMsgInd: Error receiving broadcast message: Operation canceled +Mrntt: About to unload all stim buff api libs. + +Thread 7 "salmanoff" received signal SIGSEGV, Segmentation fault. +[Switching to Thread 0x7ffff4ffa6c0 (LWP 805036)] +0x0000000000000000 in ?? () + +=== SEGFAULT DETECTED === +#0 0x0000000000000000 in ?? () +#1 0x00007ffff7ace057 in smo::stim_buff::AttachDeviceReq::attachDeviceReq2 (this=0x7ffff00098a0, + context=std::shared_ptr (use count 4, weak count 1) = {...}, error=...) at /home/latentprion/gits/salmanoff-git/stimBuffApis/livoxGen1/livoxGen1.cpp:160 +#2 0x00007ffff7ae6584 in std::__invoke_impl, boost::system::error_code const&), smo::stim_buff::AttachDeviceReq*&, std::shared_ptr&, boost::system::error_code const&> ( + __f=@0x7ffff4ff9a80: (void (smo::stim_buff::AttachDeviceReq::*)(smo::stim_buff::AttachDeviceReq * const, std::shared_ptr, const boost::system::error_code &)) 0x7ffff7acde68 , boost::system::error_code const&)>, + __t=@0x7ffff4ff9aa0: 0x7ffff00098a0) at /usr/include/c++/13/bits/invoke.h:74 +#3 0x00007ffff7ae42b1 in std::__invoke, boost::system::error_code const&), smo::stim_buff::AttachDeviceReq*&, std::shared_ptr&, boost::system::error_code const&> ( + __fn=@0x7ffff4ff9a80: (void (smo::stim_buff::AttachDeviceReq::*)(smo::stim_buff::AttachDeviceReq * const, std::shared_ptr, const boost::system::error_code &)) 0x7ffff7acde68 , boost::system::error_code const&)>) + at /usr/include/c++/13/bits/invoke.h:96 +#4 0x00007ffff7ae1fe1 in std::_Bind, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>::__call(std::tuple&&, std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x7ffff4ff9a80, __args=...) at /usr/include/c++/13/functional:506 +#5 0x00007ffff7ade79a in std::_Bind, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>::operator()(boost::system::error_code const&) (this=0x7ffff4ff9a80) + at /usr/include/c++/13/functional:591 +#6 0x00007ffff7aec999 in boost::asio::detail::binder1, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::system::error_code>::operator()() (this=0x7ffff4ff9a80) + at /usr/include/boost/asio/detail/bind_handler.hpp:171 +#7 0x00007ffff7aebd0e in boost::asio::asio_handler_invoke, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::system::error_code> >(boost::asio::detail::binder1, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::system::error_code>&, ...) (function=...) at /usr/include/boost/asio/handler_invoke_hook.hpp:88 +#8 0x00007ffff7aea450 in boost_asio_handler_invoke_helpers::invoke, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::system::error_code>, std::_Bind, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)> >(boost::asio::detail::binder1, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::system::error_code>&, std::_Bind, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>&) (function=..., context=...) at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:54 +#9 0x00007ffff7ae8790 in boost::asio::detail::handler_work, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::asio::any_io_executor, void>::complete, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::system::error_code> >(boost::asio::detail::binder1, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::system::error_code>&, std::_Bind, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>&) (this=0x7ffff4ff9a40, function=..., handler=...) at /usr/include/boost/asio/detail/handler_work.hpp:524 +#10 0x00007ffff7ae6986 in boost::asio::detail::wait_handler, std::_Placeholder<1>))(std::shared_ptr, boost::system::error_code const&)>, boost::asio::any_io_executor>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) (owner=0x7ffff0007970, base=0x7fffe400a180) at /usr/include/boost/asio/detail/wait_handler.hpp:76 +#11 0x000055555556d35e in boost::asio::detail::scheduler_operation::complete (this=0x7fffe400a180, owner=0x7ffff0007970, ec=..., bytes_transferred=0) + at /usr/include/boost/asio/detail/scheduler_operation.hpp:40 +#12 0x00005555555706e7 in boost::asio::detail::scheduler::do_run_one (this=0x7ffff0007970, lock=..., this_thread=..., ec=...) at /usr/include/boost/asio/detail/impl/scheduler.ipp:493 +#13 0x00005555555700b9 in boost::asio::detail::scheduler::run (this=0x7ffff0007970, ec=...) at /usr/include/boost/asio/detail/impl/scheduler.ipp:210 +#14 0x0000555555570a9d in boost::asio::io_context::run (this=0x7ffff0007900) at /usr/include/boost/asio/impl/io_context.ipp:64 +#15 0x00005555555f6b10 in smo::MindThread::main (self=...) at /home/latentprion/gits/salmanoff-git/smocore/componentThread.cpp:82 +#16 0x00005555555f4ed3 in std::__invoke_impl > ( + __f=@0x7ffff0007bf0: 0x5555555f6984 ) at /usr/include/c++/13/bits/invoke.h:61 +#17 0x00005555555f4e41 in std::__invoke > ( + __fn=@0x7ffff0007bf0: 0x5555555f6984 ) at /usr/include/c++/13/bits/invoke.h:96 +#18 0x00005555555f4d2f in std::thread::_Invoker > >::_M_invoke<0ul, 1ul> (this=0x7ffff0007be8) + at /usr/include/c++/13/bits/std_thread.h:292 +#19 0x00005555555f4c88 in std::thread::_Invoker > >::operator() (this=0x7ffff0007be8) + at /usr/include/c++/13/bits/std_thread.h:299 +#20 0x00005555555f4bdc in std::thread::_State_impl > > >::_M_run (this=0x7ffff0007be0) + at /usr/include/c++/13/bits/std_thread.h:244 +#21 0x00007ffff7cecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 +#22 0x00007ffff789caa4 in start_thread (arg=) at ./nptl/pthread_create.c:447 +#23 0x00007ffff7929c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 + +=== GDB is now interactive - you can inspect the state === +[Thread 0x7ffff4ffa6c0 (LWP 805036) exited] +[Thread 0x7ffff57fb6c0 (LWP 805035) exited] +[Thread 0x7ffff5ffc6c0 (LWP 805034) exited] +[Thread 0x7ffff67fd6c0 (LWP 805033) exited] +[Thread 0x7ffff6ffe6c0 (LWP 805032) exited] +[Thread 0x7ffff7f5f780 (LWP 805028) exited] +[Thread 0x7ffff77ff6c0 (LWP 805031) exited] +[New process 805028] + +Program terminated with signal SIGSEGV, Segmentation fault. +The program no longer exists. +(gdb) diff --git a/scripts/udpCommandDemuxer/gdb_heisenbug.gdb b/scripts/udpCommandDemuxer/gdb_heisenbug.gdb new file mode 100644 index 0000000..c60966b --- /dev/null +++ b/scripts/udpCommandDemuxer/gdb_heisenbug.gdb @@ -0,0 +1,123 @@ +# GDB command file for reproducing UdpCommandDemuxer heisenbug +# This script runs salmanoff, waits a random time, sends SIGINT, and catches segfaults + +# Disable pager so output doesn't pause for user input +set pagination off + +# Set up signal handling - catch segfaults and stop +handle SIGSEGV stop print +# Allow SIGINT to pass through to program silently - make it unremarkable +# nostop: don't stop execution, noprint: don't print message, pass: pass to program +handle SIGINT nostop noprint pass + +# Use Python to set up automatic handling of stop events and SIGINT injection +python +import time +import random +import threading +import os +import signal + +sigint_thread_started = False + +def send_sigint_after_delay(): + # Wait random milliseconds between 2000-3000 + delay_ms = random.randint(2000, 3000) + print(f"Waiting {delay_ms}ms before sending SIGINT...") + time.sleep(delay_ms / 1000.0) + + # Send SIGINT directly to the process using its PID + # This works even when the program is running (unlike gdb.execute("signal SIGINT")) + try: + inferior = gdb.selected_inferior() + if inferior and inferior.is_valid(): + pid = inferior.pid + print(f"Sending SIGINT to program (PID: {pid})...") + os.kill(pid, signal.SIGINT) + else: + print("Program is not running - cannot send SIGINT") + except Exception as e: + print(f"Failed to send SIGINT: {e}") + +def start_sigint_thread(): + global sigint_thread_started + if not sigint_thread_started: + sigint_thread_started = True + thread = threading.Thread(target=send_sigint_after_delay, daemon=True) + thread.start() + +# Hook to check stop reason and handle segfaults +def stop_handler(event): + if isinstance(event, gdb.SignalEvent): + if event.stop_signal == "SIGSEGV": + # Segfault detected + print("\n=== SEGFAULT DETECTED ===") + gdb.execute("bt") + print("\n=== GDB is now interactive - you can inspect the state ===") + # Don't quit - stay in interactive mode + elif event.stop_signal == "SIGINT": + # SIGINT received - with "nostop pass", SIGINT should pass through automatically + # But if we get here (shouldn't happen with nostop), just let it pass + pass + elif isinstance(event, gdb.ExitedEvent): + # Program exited normally + if event.exit_code == 0: + print("\nProgram exited normally. Continuing loop...") + gdb.post_event(lambda: gdb.execute("quit", False)) + else: + print(f"\nProgram exited with code {event.exit_code}") + gdb.post_event(lambda: gdb.execute("quit", False)) + +# Hook for when program continues/starts running +def cont_handler(event): + # When program continues (or starts running), start the SIGINT thread + start_sigint_thread() + +# Register event handlers +gdb.events.stop.connect(stop_handler) +gdb.events.cont.connect(cont_handler) + +# Start SIGINT thread before running - it will wait and then send SIGINT +# The thread will send SIGINT even if program is stopped (signal will be delivered on continue) +start_sigint_thread() +end + +# Start the program +echo Starting program...\n +run + +# After run completes, check if program exited or stopped +# If program exited, quit GDB. If program stopped (has threads), continue. +python +try: + inferior = gdb.selected_inferior() + if inferior and inferior.is_valid(): + # Check if there are threads (indicates program has not exited) + try: + threads = inferior.threads() + if threads: + # Program has threads - continue execution + # SIGINT thread is already running and will send signal when ready + gdb.execute("continue", False) + else: + # No threads - program has exited + print("\nProgram has exited (no threads found).") + gdb.execute("quit", False) + except Exception as e: + # If we can't check threads, assume program exited + print(f"\nError checking threads: {e}") + print("Assuming program exited.") + gdb.execute("quit", False) + else: + # Inferior is not valid - program has exited + print("\nProgram has exited (inferior not valid).") + gdb.execute("quit", False) +except Exception as e: + print(f"Error checking program state: {e}") + # If we can't determine state, try to quit + try: + gdb.execute("quit", False) + except: + pass +end + diff --git a/scripts/udpCommandDemuxer/reproduce_heisenbug.sh b/scripts/udpCommandDemuxer/reproduce_heisenbug.sh new file mode 100755 index 0000000..1df0c36 --- /dev/null +++ b/scripts/udpCommandDemuxer/reproduce_heisenbug.sh @@ -0,0 +1,115 @@ +#!/bin/bash +# Script to reproduce UdpCommandDemuxer race condition heisenbug +# Runs salmanoff in GDB repeatedly, injecting SIGINT at random intervals +# +# Usage: ./reproduce_heisenbug.sh [WORKING_DIR] +# WORKING_DIR: Working directory where salmanoff binary and all paths are relative to +# If not provided, uses WORKING_DIR environment variable, or defaults to project root +# +# Environment variables: +# WORKING_DIR: Working directory (can be overridden by command-line argument) + +# Get the directory where this script is located +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" + +# Determine working directory (command-line arg > env var > default) +if [ -n "$1" ]; then + WORKING_DIR="$1" +elif [ -n "$WORKING_DIR" ]; then + # Use environment variable + : +else + # Default to project root + WORKING_DIR="$PROJECT_ROOT" +fi + +# Convert to absolute path +WORKING_DIR="$(cd "$WORKING_DIR" && pwd)" + +# Check if working directory exists +if [ ! -d "$WORKING_DIR" ]; then + echo "Error: Working directory does not exist: $WORKING_DIR" >&2 + exit 1 +fi + +# Paths - all relative to working directory +SALMANOFF_BINARY="$WORKING_DIR/salmanoff" +GDB_SCRIPT="$SCRIPT_DIR/gdb_heisenbug.gdb" + +# Check if binary exists +if [ ! -f "$SALMANOFF_BINARY" ]; then + echo "Error: salmanoff binary not found at $SALMANOFF_BINARY" >&2 + echo "Working directory: $WORKING_DIR" >&2 + exit 1 +fi + +# Check if GDB script exists +if [ ! -f "$GDB_SCRIPT" ]; then + echo "Error: GDB script not found at $GDB_SCRIPT" >&2 + exit 1 +fi + +# Command line arguments for salmanoff +SALMANOFF_ARGS=( + -p commonLibs/livoxProto1/ + -p commonLibs/xcbXorg/ + -p stimBuffApis/xcbWindow/ + -p stimBuffApis/livoxGen1/ + -a libxcbWindow.so + -a liblivoxGen1.so + -d devices/bodies/dell-laptop.daps +) + +echo "=== UdpCommandDemuxer Heisenbug Reproduction Script ===" +echo "Working Directory: $WORKING_DIR" +echo "Binary: $SALMANOFF_BINARY" +echo "GDB Script: $GDB_SCRIPT" +echo "Arguments: ${SALMANOFF_ARGS[*]}" +echo "" +echo "Press Ctrl+C to stop the loop" +echo "" + +# Change to working directory so all relative paths are resolved correctly +cd "$WORKING_DIR" + +# Loop counter +ITERATION=0 + +# Main loop +while true; do + ITERATION=$((ITERATION + 1)) + echo "==========================================" + echo "Iteration $ITERATION - $(date)" + echo "==========================================" + echo "" + + # Run GDB with the command file + # GDB will stay interactive on segfault, exit on normal completion + # When GDB stays interactive (on segfault), this will wait for user to quit GDB + # When GDB exits normally (program completed), exit code will be 0 and loop continues + # Note: We use a relative path to salmanoff binary since we're already in WORKING_DIR + SALMANOFF_RELATIVE="salmanoff" + if gdb -x "$GDB_SCRIPT" --args "$SALMANOFF_RELATIVE" "${SALMANOFF_ARGS[@]}"; then + # GDB exited successfully (program completed normally) + EXIT_CODE=0 + else + # GDB exited with error (unexpected exit or user interrupted) + EXIT_CODE=$? + echo "" + echo "GDB exited with code $EXIT_CODE" + if [ $EXIT_CODE -ne 0 ] && [ $EXIT_CODE -ne 130 ]; then + # Exit code 130 is SIGINT (user pressed Ctrl+C), which is expected + echo "Unexpected GDB exit - check output above" + fi + fi + + echo "" + echo "Iteration $ITERATION complete. Starting next iteration in 1 second..." + sleep 1 + echo "" +done + +echo "" +echo "Loop terminated." +