I've noticed a deadlock condition when using StackWalk64. It happens when a target thread is suspended inside an Rtl* function in the installable file system driver subsystem. Basically if you suspend a function while it's holding that lock and call StackWalk64, the walker tends to go out to the file system (FindFirstFile looking for symbols usually), try to take the same lock, and deadlock. I provided a stub replacement function table callback, but it hasn't fixed it. Although I can use DoStackSnapshot to detect when CLR stack walks are unsafe, I can't see any way of knowing when the driver layer lock is held. Any suggestions for how to avoid this problem would be appreciated.AnswersHi Promit,
StackWalk64() is a heavyweight stack unwind API as potentially it will trigger symbols loading. So yes, it might deadlock because of these. It’s not recommended to use StackWalk64() (and MiniDumpWriteDump()) in-proc because of similar reasons.
There is a lightweight RTL stack capture function (RtlCaptureStackBackTrace()) that will try unwind the stack without symbols help. Please see http://msdn.microsoft.com/en-us/library/dd434873.aspx for more information
Thanks, Shane - Marked As Answer byJon LangdonMSFT, OwnerSaturday, August 08, 2009 4:39 PM
-
All RepliesHi Promit,
StackWalk64() is a heavyweight stack unwind API as potentially it will trigger symbols loading. So yes, it might deadlock because of these. It’s not recommended to use StackWalk64() (and MiniDumpWriteDump()) in-proc because of similar reasons.
There is a lightweight RTL stack capture function (RtlCaptureStackBackTrace()) that will try unwind the stack without symbols help. Please see http://msdn.microsoft.com/en-us/library/dd434873.aspx for more information
Thanks, Shane - Marked As Answer byJon LangdonMSFT, OwnerSaturday, August 08, 2009 4:39 PM
-
Quick note: the proper name of the function in usermode code is CaptureStackBackTrace. The problem I see is that this function will always walk the current thread, not an arbitrary suspended thread like StackWalk64 can. This makes it not especially helpful for a sampling type profiler, unless I'm missing something.
| | Promit | Hi Promit,
Youcan create a child process to call StackWalk64 to walk a suspended thread inthe parentprocessthrough IPC coordination. Obviously, it won't work if the target thread is holding amachine-wide lock.I hope that it's not the case. Did you mean ntfs.sys or akernel driverwhenreferring to theinstallable file system driver subsystem? I would like to knowwhether the lock that target thread is holding could deadlock other users.
Thanks, Shane | | Shane Yuan | I'm not sure exactly what the lock being held is, but when a thread holding that lock is suspended, I haven't noticed anything that would suggest a system wide lock. Maybe it's a specific heap that is taking a lock? Creating a child process isn't ideal by any means, but if that's the only option...
| | Promit | Hi Promit,
If you decide to try to use StackWalk64 from another process, can you let us know if it's still possible to deadlock whenthe target thread is suspended inside an Rtl* function in the installable file system driver subsystem? If it does, it means that the target thread could be holding a system wide lock.
Thanks, Shane | | Shane Yuan | | I'll mention it if I see it, but it's not a common deadlock by any means, so no promises that I'll manage to trap it in a debugger any time soon. | | Promit |
|