on 25/01/2012 23:52 Andriy Gapon said the following:
> on 24/01/2012 14:32 Gleb Smirnoff said the following:
>> Yes, now:
>> lock order reversal:
>> 1st 0xffffffff80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542
>> 2nd 0xfffffe0001f5d838 uart_hwmtx (uart_hwmtx) @ /usr/src/head/sys/dev/uart/uart_cpu.h:92
>> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500
> OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no new
> It's still intriguing to me why the LOR *doesn't* happen [*] with
> stop_scheduler_on_panic=0. But I don't see a productive way to pursue this
> investigation further.
After your recent nudging I took yet another look at this issue and it seems that
I have some findings.
A short summary.
Prerequisites: an SMP x86 system, its kernel is configured with WITNESS &&
!WITNESS_SKIPSPIN (this is important) and a system uses serial console via uart.
Then, if stop_scheduler_on_panic is set to zero the system can be rebooted without
a problem. On the other hand, if stop_scheduler_on_panic is enabled, then the
system first runs into a LOR when calling printf() in cpu_reset() and then it runs
into a panic when printf is recursively called from witness(9) to report the LOR.
The panic happens because of the recursion on cnputs_mtx, which should ensure
that cnputs() output is not intermingled and which is not flagged to support
There are two things about this report that greatly confused and puzzled me:
1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how could it
be possible that changing its value affects behavior of the system when panic(9)
is not called?!
2. The LOR in question happens between "smp rendezvous" (smp_ipi_mtx) and
"uart_hwmtx" (sc_hwmtx_s in uart core) spin locks. The order of these locks is
actually predefined in witness order_lists such that uart_hwmtx must come before
smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in
shutdown_reset(), then we call cpu_reset, then it calls printf and from there we
get to uart_hwmtx for serial console output. So the order between these spinlocks
is always violated in the x86 SMP reboot path.
How come witness(9) doesn't _always_ detect this LOR?
How come it didn't detect this LOR before any "stop scheduler" commits?!
[Spoiler alert :-)]
Turns out that the two puzzles above are closely related.
Let's first list all the things that change when stop_scheduler_on_panic is
enabled and a panic happens:
- other CPUs are stopped (forced to spin)
- interrupts on current CPU are disabled
- by virtue of the above the current thread should be the only thread running
(unless it executes a voluntary switch)
- all locks are "busted", they are completely ignored / bypassed
- by virtue of the above no lock invariants and witness checks are performed, so
no lock order checking, no recursion checking, etc
So, what I observe is this: when stop_scheduler_on_panic is disabled, the LOR is
actually detected as it should be. witness(9) works properly here. Once the LOR
is detected witness(9) wants to report it using printf(9). That's where we run
into the cnputs_mtx recursion panic. It's all exactly as with
stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic using
printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() detects
locks recursion again (this is independent of witness(9)) and calls panic(9)
again. Then panic(9) wants to report the panic using printf(9)...
I assume that when the stack is exhausted we run into a double fault and
dblfault_handler wants to print something again... Likely we eventually run into
a triple fault which resets the machine. Although, I recall some old reports of
machines hanging during reboot in a place suspiciously close to where the
described ordeal happens...
But if the machine doesn't hang we get a full appearance of the reset successfully
happening (modulo some last messages missing).
With stop_scheduler_on_panic enabled and all the locks being ignored we, of
course, do not run into any secondary lock recursions and resulting panics. So
the system is able to at least panic "gracefully" (still we should have just
reported the LOR gracefully, no panic is required).
Some obvious conclusions:
- the "smp rendezvous" and "uart_hwmtx" LOR is real and it is the true cause of
the problem; it should be fixed one way or other - either by correcting witness
order_lists or by avoiding the LOR in shutdown_reset/cpu_reset;
- because witness(9) uses printf(9) to report problems, it is very fragile to use
witness with any locks that can be acquired under printf(9)
- stop_scheduler_on_panic just uncovers the true bug
There are probably other conclusions that can be made.
I welcome any suggestions on how to fix the problems discovered.