Skip to content
Snippets Groups Projects
  1. Nov 09, 2021
    • Ammar Faizi's avatar
      test: Add kworker-hang test · 1fcdf578
      Ammar Faizi authored
      This is the reproducer for the kworker hang bug.
      
      Reproduction Steps:
        1) A user task calls io_uring_queue_exit().
      
        2) Suspend the task with SIGSTOP / SIGTRAP before the ring exit is
           finished (do it as soon as step (1) is done).
      
        3) Wait for `/proc/sys/kernel/hung_task_timeout_secs` seconds
           elapsed.
      
        4) We get a complaint from the khungtaskd because the kworker is
           stuck in an uninterruptible state (D).
      
      The kworkers waiting on ring exit are not progressing as the task
      cannot proceed. When the user task is continued (e.g. get SIGCONT
      after SIGSTOP, or continue after SIGTRAP breakpoint), the kworkers
      then can finish the ring exit.
      
      We need a special handling for this case to avoid khungtaskd
      complaint. Currently we don't have the fix for this.
      
      The dmesg says:
      
        [247390.432294] INFO: task kworker/u8:2:358488 blocked for more than 10 seconds.
        [247390.432314]       Tainted: G           OE     5.15.0-stable #5
        [247390.432322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [247390.432329] task:kworker/u8:2    state:D stack:    0 pid:358488 ppid:     2 flags:0x00004000
        [247390.432341] Workqueue: events_unbound io_ring_exit_work
        [247390.432354] Call Trace:
        [247390.432368]  __schedule+0x453/0x1850
        [247390.432388]  ? lock_acquire+0xc8/0x2d0
        [247390.432404]  ? usleep_range+0x90/0x90
        [247390.432412]  schedule+0x59/0xc0
        [247390.432420]  schedule_timeout+0x1aa/0x1f0
        [247390.432429]  ? mark_held_locks+0x49/0x70
        [247390.432439]  ? lockdep_hardirqs_on_prepare+0xff/0x180
        [247390.432445]  ? _raw_spin_unlock_irq+0x24/0x40
        [247390.432456]  __wait_for_common+0xc2/0x170
        [247390.432473]  io_ring_exit_work+0x1d9/0x750
        [247390.432486]  ? io_uring_del_tctx_node+0xe0/0xe0
        [247390.432502]  ? verify_cpu+0xf0/0x100
        [247390.432520]  process_one_work+0x23b/0x550
        [247390.432540]  worker_thread+0x55/0x3c0
        [247390.432546]  ? process_one_work+0x550/0x550
        [247390.432556]  kthread+0x140/0x160
        [247390.432564]  ? set_kthread_struct+0x40/0x40
        [247390.432574]  ret_from_fork+0x1f/0x30
        [247390.432605] INFO: task kworker/u8:0:359615 blocked for more than 10 seconds.
        [247390.432613]       Tainted: G           OE     5.15.0-stable #5
        [247390.432620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [247390.432626] task:kworker/u8:0    state:D stack:    0 pid:359615 ppid:     2 flags:0x00004000
        [247390.432635] Workqueue: events_unbound io_ring_exit_work
        [247390.432643] Call Trace:
        [247390.432653]  __schedule+0x453/0x1850
        [247390.432676]  ? usleep_range+0x90/0x90
        [247390.432684]  schedule+0x59/0xc0
        [247390.432691]  schedule_timeout+0x1aa/0x1f0
        [247390.432700]  ? mark_held_locks+0x49/0x70
        [247390.432710]  ? lockdep_hardirqs_on_prepare+0xff/0x180
        [247390.432717]  ? _raw_spin_unlock_irq+0x24/0x40
        [247390.432728]  __wait_for_common+0xc2/0x170
        [247390.432744]  io_ring_exit_work+0x1d9/0x750
        [247390.432758]  ? io_uring_del_tctx_node+0xe0/0xe0
        [247390.432772]  ? verify_cpu+0xf0/0x100
        [247390.432788]  process_one_work+0x23b/0x550
        [247390.432807]  worker_thread+0x55/0x3c0
        [247390.432813]  ? process_one_work+0x550/0x550
        [247390.432824]  kthread+0x140/0x160
        [247390.432830]  ? set_kthread_struct+0x40/0x40
        [247390.432839]  ret_from_fork+0x1f/0x30
        [247390.432870]
                        Showing all locks held in the system:
        [247390.432877] 1 lock held by khungtaskd/40:
        [247390.432880]  #0: ffffffff82976700 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x174
        [247390.432911] 1 lock held by in:imklog/922:
        [247390.432915]  #0: ffff8881041cfcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4a/0x60
        [247390.432977] 2 locks held by pager/318088:
        [247390.432981]  #0: ffff8881208d4898 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x50
        [247390.433001]  #1: ffffc900010fd2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x49e/0x660
        [247390.433024] 1 lock held by htop/341462:
        [247390.433032] 2 locks held by kworker/u8:2/358488:
        [247390.433035]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
        [247390.433053]  #1: ffffc90003797e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
        [247390.433071] 2 locks held by kworker/u8:0/359615:
        [247390.433075]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
        [247390.433092]  #1: ffffc90003597e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
        [247390.433110] 1 lock held by dmesg/361178:
        [247390.433113]  #0: ffff88810b5300d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x4b/0x230
      
        [247390.433134] =============================================
      
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Link: https://github.com/axboe/liburing/issues/448
      
      
      Signed-off-by: default avatarAmmar Faizi <ammar.faizi@students.amikom.ac.id>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1fcdf578
  2. Nov 06, 2021
  3. Nov 05, 2021
  4. Oct 31, 2021
  5. Oct 30, 2021
  6. Oct 25, 2021
  7. Oct 24, 2021
  8. Oct 21, 2021
  9. Oct 18, 2021
  10. Oct 11, 2021
  11. Oct 10, 2021
  12. Oct 07, 2021
  13. Oct 06, 2021
  14. Oct 05, 2021
  15. Oct 04, 2021
  16. Oct 03, 2021
  17. Sep 27, 2021
    • Ammar Faizi's avatar
      test: Fix endianess issue on `bind()` and `connect()` · ce105386
      Ammar Faizi authored
      When we call `bind()` or `connect()`, the `addr` and `port` should be
      in big endian value representation.
      
      Portability notes:
        - Do not hard code the address with integer like this `0x0100007fU`.
          Instead, use `inet_addr("127.0.0.1")` to ensure portability for
          the machine with different endianess. It's also cleaner and more
          readable to use `inet_addr()` from `#include <arpa/inet.h>`.
      
        - Use `htons(port_number)` to make sure the port_number is properly
          choosen instead directly assign it with integer (still about
          endianess problem).
      
      This commit fixes endianess issue in these files:
        test/232c93d07b74-test.c
        test/accept-link.c
        test/accept.c
        test/poll-link.c
        test/shutdown.c
        test/socket-rw-eagain.c
        test/socket-rw.c
      
      Fixes: 08bd8151 ("Un-DOSify test/232c93d07b74-test.c")
      Fixes: 4bce856d ("Improve reliability of poll/accept-link tests")
      Fixes: 76e3b792 ("Add nonblock empty socket read test")
      Fixes: 7de62535 ("test/accept: code reuse cleanup")
      Fixes: 8a3a8d74 ("Add IORING_OP_SHUTDOWN test case")
      Fixes: e3adbfc2
      
       ("Moves function calls out of assert().")
      Suggested-by: default avatarLouvian Lyndal <louvianlyndal@gmail.com>
      Signed-off-by: default avatarAmmar Faizi <ammarfaizi2@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ce105386
  18. Sep 26, 2021
  19. Sep 24, 2021