test: Add kworker-hang test
This is the reproducer for the kworker hang bug. Reproduction Steps: 1) A user task calls io_uring_queue_exit(). 2) Suspend the task with SIGSTOP / SIGTRAP before the ring exit is finished (do it as soon as step (1) is done). 3) Wait for `/proc/sys/kernel/hung_task_timeout_secs` seconds elapsed. 4) We get a complaint from the khungtaskd because the kworker is stuck in an uninterruptible state (D). The kworkers waiting on ring exit are not progressing as the task cannot proceed. When the user task is continued (e.g. get SIGCONT after SIGSTOP, or continue after SIGTRAP breakpoint), the kworkers then can finish the ring exit. We need a special handling for this case to avoid khungtaskd complaint. Currently we don't have the fix for this. The dmesg says: [247390.432294] INFO: task kworker/u8:2:358488 blocked for more than 10 seconds. [247390.432314] Tainted: G OE 5.15.0-stable #5 [247390.432322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [247390.432329] task:kworker/u8:2 state:D stack: 0 pid:358488 ppid: 2 flags:0x00004000 [247390.432341] Workqueue: events_unbound io_ring_exit_work [247390.432354] Call Trace: [247390.432368] __schedule+0x453/0x1850 [247390.432388] ? lock_acquire+0xc8/0x2d0 [247390.432404] ? usleep_range+0x90/0x90 [247390.432412] schedule+0x59/0xc0 [247390.432420] schedule_timeout+0x1aa/0x1f0 [247390.432429] ? mark_held_locks+0x49/0x70 [247390.432439] ? lockdep_hardirqs_on_prepare+0xff/0x180 [247390.432445] ? _raw_spin_unlock_irq+0x24/0x40 [247390.432456] __wait_for_common+0xc2/0x170 [247390.432473] io_ring_exit_work+0x1d9/0x750 [247390.432486] ? io_uring_del_tctx_node+0xe0/0xe0 [247390.432502] ? verify_cpu+0xf0/0x100 [247390.432520] process_one_work+0x23b/0x550 [247390.432540] worker_thread+0x55/0x3c0 [247390.432546] ? process_one_work+0x550/0x550 [247390.432556] kthread+0x140/0x160 [247390.432564] ? set_kthread_struct+0x40/0x40 [247390.432574] ret_from_fork+0x1f/0x30 [247390.432605] INFO: task kworker/u8:0:359615 blocked for more than 10 seconds. [247390.432613] Tainted: G OE 5.15.0-stable #5 [247390.432620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [247390.432626] task:kworker/u8:0 state:D stack: 0 pid:359615 ppid: 2 flags:0x00004000 [247390.432635] Workqueue: events_unbound io_ring_exit_work [247390.432643] Call Trace: [247390.432653] __schedule+0x453/0x1850 [247390.432676] ? usleep_range+0x90/0x90 [247390.432684] schedule+0x59/0xc0 [247390.432691] schedule_timeout+0x1aa/0x1f0 [247390.432700] ? mark_held_locks+0x49/0x70 [247390.432710] ? lockdep_hardirqs_on_prepare+0xff/0x180 [247390.432717] ? _raw_spin_unlock_irq+0x24/0x40 [247390.432728] __wait_for_common+0xc2/0x170 [247390.432744] io_ring_exit_work+0x1d9/0x750 [247390.432758] ? io_uring_del_tctx_node+0xe0/0xe0 [247390.432772] ? verify_cpu+0xf0/0x100 [247390.432788] process_one_work+0x23b/0x550 [247390.432807] worker_thread+0x55/0x3c0 [247390.432813] ? process_one_work+0x550/0x550 [247390.432824] kthread+0x140/0x160 [247390.432830] ? set_kthread_struct+0x40/0x40 [247390.432839] ret_from_fork+0x1f/0x30 [247390.432870] Showing all locks held in the system: [247390.432877] 1 lock held by khungtaskd/40: [247390.432880] #0: ffffffff82976700 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x174 [247390.432911] 1 lock held by in:imklog/922: [247390.432915] #0: ffff8881041cfcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4a/0x60 [247390.432977] 2 locks held by pager/318088: [247390.432981] #0: ffff8881208d4898 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x50 [247390.433001] #1: ffffc900010fd2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x49e/0x660 [247390.433024] 1 lock held by htop/341462: [247390.433032] 2 locks held by kworker/u8:2/358488: [247390.433035] #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433053] #1: ffffc90003797e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433071] 2 locks held by kworker/u8:0/359615: [247390.433075] #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433092] #1: ffffc90003597e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433110] 1 lock held by dmesg/361178: [247390.433113] #0: ffff88810b5300d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x4b/0x230 [247390.433134] ============================================= Cc: Pavel Begunkov <asml.silence@gmail.com> Link: https://github.com/axboe/liburing/issues/448 Signed-off-by:Ammar Faizi <ammar.faizi@students.amikom.ac.id> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
Please register or sign in to comment