Skip to content
Snippets Groups Projects
Commit 1fcdf578 authored by Ammar Faizi's avatar Ammar Faizi Committed by Jens Axboe
Browse files

test: Add kworker-hang test

This is the reproducer for the kworker hang bug.

Reproduction Steps:
  1) A user task calls io_uring_queue_exit().

  2) Suspend the task with SIGSTOP / SIGTRAP before the ring exit is
     finished (do it as soon as step (1) is done).

  3) Wait for `/proc/sys/kernel/hung_task_timeout_secs` seconds
     elapsed.

  4) We get a complaint from the khungtaskd because the kworker is
     stuck in an uninterruptible state (D).

The kworkers waiting on ring exit are not progressing as the task
cannot proceed. When the user task is continued (e.g. get SIGCONT
after SIGSTOP, or continue after SIGTRAP breakpoint), the kworkers
then can finish the ring exit.

We need a special handling for this case to avoid khungtaskd
complaint. Currently we don't have the fix for this.

The dmesg says:

  [247390.432294] INFO: task kworker/u8:2:358488 blocked for more than 10 seconds.
  [247390.432314]       Tainted: G           OE     5.15.0-stable #5
  [247390.432322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [247390.432329] task:kworker/u8:2    state:D stack:    0 pid:358488 ppid:     2 flags:0x00004000
  [247390.432341] Workqueue: events_unbound io_ring_exit_work
  [247390.432354] Call Trace:
  [247390.432368]  __schedule+0x453/0x1850
  [247390.432388]  ? lock_acquire+0xc8/0x2d0
  [247390.432404]  ? usleep_range+0x90/0x90
  [247390.432412]  schedule+0x59/0xc0
  [247390.432420]  schedule_timeout+0x1aa/0x1f0
  [247390.432429]  ? mark_held_locks+0x49/0x70
  [247390.432439]  ? lockdep_hardirqs_on_prepare+0xff/0x180
  [247390.432445]  ? _raw_spin_unlock_irq+0x24/0x40
  [247390.432456]  __wait_for_common+0xc2/0x170
  [247390.432473]  io_ring_exit_work+0x1d9/0x750
  [247390.432486]  ? io_uring_del_tctx_node+0xe0/0xe0
  [247390.432502]  ? verify_cpu+0xf0/0x100
  [247390.432520]  process_one_work+0x23b/0x550
  [247390.432540]  worker_thread+0x55/0x3c0
  [247390.432546]  ? process_one_work+0x550/0x550
  [247390.432556]  kthread+0x140/0x160
  [247390.432564]  ? set_kthread_struct+0x40/0x40
  [247390.432574]  ret_from_fork+0x1f/0x30
  [247390.432605] INFO: task kworker/u8:0:359615 blocked for more than 10 seconds.
  [247390.432613]       Tainted: G           OE     5.15.0-stable #5
  [247390.432620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [247390.432626] task:kworker/u8:0    state:D stack:    0 pid:359615 ppid:     2 flags:0x00004000
  [247390.432635] Workqueue: events_unbound io_ring_exit_work
  [247390.432643] Call Trace:
  [247390.432653]  __schedule+0x453/0x1850
  [247390.432676]  ? usleep_range+0x90/0x90
  [247390.432684]  schedule+0x59/0xc0
  [247390.432691]  schedule_timeout+0x1aa/0x1f0
  [247390.432700]  ? mark_held_locks+0x49/0x70
  [247390.432710]  ? lockdep_hardirqs_on_prepare+0xff/0x180
  [247390.432717]  ? _raw_spin_unlock_irq+0x24/0x40
  [247390.432728]  __wait_for_common+0xc2/0x170
  [247390.432744]  io_ring_exit_work+0x1d9/0x750
  [247390.432758]  ? io_uring_del_tctx_node+0xe0/0xe0
  [247390.432772]  ? verify_cpu+0xf0/0x100
  [247390.432788]  process_one_work+0x23b/0x550
  [247390.432807]  worker_thread+0x55/0x3c0
  [247390.432813]  ? process_one_work+0x550/0x550
  [247390.432824]  kthread+0x140/0x160
  [247390.432830]  ? set_kthread_struct+0x40/0x40
  [247390.432839]  ret_from_fork+0x1f/0x30
  [247390.432870]
                  Showing all locks held in the system:
  [247390.432877] 1 lock held by khungtaskd/40:
  [247390.432880]  #0: ffffffff82976700 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x174
  [247390.432911] 1 lock held by in:imklog/922:
  [247390.432915]  #0: ffff8881041cfcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4a/0x60
  [247390.432977] 2 locks held by pager/318088:
  [247390.432981]  #0: ffff8881208d4898 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x50
  [247390.433001]  #1: ffffc900010fd2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x49e/0x660
  [247390.433024] 1 lock held by htop/341462:
  [247390.433032] 2 locks held by kworker/u8:2/358488:
  [247390.433035]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433053]  #1: ffffc90003797e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433071] 2 locks held by kworker/u8:0/359615:
  [247390.433075]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433092]  #1: ffffc90003597e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433110] 1 lock held by dmesg/361178:
  [247390.433113]  #0: ffff88810b5300d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x4b/0x230

  [247390.433134] =============================================

Cc: Pavel Begunkov <asml.silence@gmail.com>
Link: https://github.com/axboe/liburing/issues/448


Signed-off-by: default avatarAmmar Faizi <ammar.faizi@students.amikom.ac.id>
Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent 7b5681f8
Branches issue448
No related merge requests found
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment