HomeFreeBSD

Reduce the stack usage of dsl_dataset_remove_clones_key

Description

Reduce the stack usage of dsl_dataset_remove_clones_key

dataset_remove_clones_key does recursion, so if the recursion goes
deep it can overrun the linux kernel stack size of 8KB. I have seen
this happen in the actual deployment, and subsequently confirmed it by
running a test workload on a custom-built kernel that uses 32KB stack.

See the following stack trace as an example of the case where it would
have run over the 8KB stack kernel:

      Depth    Size   Location    (42 entries)
      -----    ----   --------
0)    11192      72   __kmalloc+0x2e/0x240
1)    11120     144   kmem_alloc_debug+0x20e/0x500
2)    10976      72   dbuf_hold_impl+0x4a/0xa0
3)    10904     120   dbuf_prefetch+0xd3/0x280
4)    10784      80   dmu_zfetch_dofetch.isra.5+0x10f/0x180
5)    10704     240   dmu_zfetch+0x5f7/0x10e0
6)    10464     168   dbuf_read+0x71e/0x8f0
7)    10296     104   dnode_hold_impl+0x1ee/0x620
8)    10192      16   dnode_hold+0x19/0x20
9)    10176      88   dmu_buf_hold+0x42/0x1b0
  1. 10088 144 zap_lockdir+0x48/0x730
  2. 9944 128 zap_cursor_retrieve+0x1c4/0x2f0
  3. 9816 392 dsl_dataset_remove_clones_key.isra.14+0xab/0x190
  4. 9424 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  5. 9032 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  6. 8640 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  7. 8248 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  8. 7856 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  9. 7464 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  10. 7072 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  11. 6680 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  12. 6288 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  13. 5896 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  14. 5504 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  15. 5112 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  16. 4720 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  17. 4328 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  18. 3936 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  19. 3544 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  20. 3152 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  21. 2760 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  22. 2368 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  23. 1976 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  24. 1584 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190
  25. 1192 232 dsl_dataset_destroy_sync+0x311/0xf60
  26. 960 72 dsl_sync_task_group_sync+0x12f/0x230
  27. 888 168 dsl_pool_sync+0x48b/0x5c0
  28. 720 184 spa_sync+0x417/0xb00
  29. 536 184 txg_sync_thread+0x325/0x5b0
  30. 352 48 thread_generic_wrapper+0x7a/0x90
  31. 304 128 kthread+0xc0/0xd0
  32. 176 176 ret_from_fork+0x7c/0xb0

This change reduces the stack usage in dsl_dataset_remove_clones_key
by allocating structures in heap, not in stack. This is not a fundamental
fix, as one can create an arbitrary large data set that runs over any
fixed size stack, but this will make the problem far less likely.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kohsuke Kawaguchi <kk@kohsuke.org>
Closes #1726

Details

Provenance
Kohsuke Kawaguchi <kk@kohsuke.org>Authored on Sep 25 2013, 10:14 PM
Brian Behlendorf <behlendorf1@llnl.gov>Committed on Sep 25 2013, 10:18 PM
Parents
rG34d5a5fd0321: Fix zpl_mknod() return values
Branches
Unknown
Tags
Unknown

Event Timeline

Brian Behlendorf <behlendorf1@llnl.gov> committed rG77831e17385b: Reduce the stack usage of dsl_dataset_remove_clones_key (authored by Kohsuke Kawaguchi <kk@kohsuke.org>).Sep 25 2013, 10:18 PM