ktrace: Record detailed ECAPMODE violations When a Capsicum violation occurs in the kernel, ktrace will now record detailed information pertaining to the violation. For example: - When a namei lookup violation occurs, ktrace will record the path. - When a signal violation occurs, ktrace will record the signal number. - When a sendto(2) violation occurs, ktrace will record the recipient sockaddr. For all violations, the syscall and ABI is recorded. kdump is also modified to display this new information to the user.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped - Build Status
Buildable 56527 Build 53415: arc lint + arc unit
Event Timeline
sys/kern/sys_capability.c | ||
---|---|---|
160 | Missing a newline after a variable definition. | |
184–185 | Missing a newline after a variable definition. | |
sys/sys/ktrace.h | ||
212 | ||
217 | IMO, CAPFAIL_NAMEI would be a better name for this. | |
219 | Missing newlines between the type definitions. | |
226 | This can be PATH_MAX instead, in which case you can pull in syslimits.h instead of the much larger param.h. | |
usr.bin/kdump/kdump.c | ||
2128 | Missing newline after a local variable definition. |
To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.
So this traces the system calls that are not on the allowed-in-cap-mode list?
To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.
I haven't looked into the code, to be honest. However, I don't see a real application for this approach, or maybe I misread how this is supposed to work.
Is this a tool for improving debugging sandboxed applications or sandboxing new applications?
If the second, this doesn't help determine the "trusted part" and "capability part" of the application because after re-arranging the code I still will get the same errors.
If I would try to sandbox a new application why not put cap_enter at a "random" place and see how ktrace reports the same errors?
After a quick scan, I don't see any additional parameter to ktrace. Can I see a sample output of ktrace? Won't it be misleading for the user that something is happening in TCB ("trusted part") and will be reported as a violation of Capsicum?
Again, maybe I just need some more context to understand the reasoning behind this change.
Again, maybe I just need some more context to understand the reasoning behind this change.
Here is an example of ktrace CAPFAIL tracing in action, with this patch:
Mostly for sandboxing new applications.
If the second, this doesn't help determine the "trusted part" and "capability part" of the application because after re-arranging the code I still will get the same errors.
No, it doesn't tell you how to split out your code. The idea is that would be useful early on, when becoming familiar with an application's behaviour. Tracing records
If I would try to sandbox a new application why not put cap_enter at a "random" place and see how ktrace reports the same errors?
Because the application will (probably) quickly fail and exit or just spin its wheels somehow, because most of its system calls will fail. This new mode does not affect the behaviour of the program.
After a quick scan, I don't see any additional parameter to ktrace. Can I see a sample output of ktrace? Won't it be misleading for the user that something is happening in TCB ("trusted part") and will be reported as a violation of Capsicum?
Again, maybe I just need some more context to understand the reasoning behind this change.
If I understand correctly, for application like:
localtime(); open(); cap_enter() openat()
The first two operations will always cause ktrace to report insufficient capabilities. Which is a false-postive statement, and will be misleading for "normal" users.
The application that wasn't supposed to be sandboxed will also get these errors.
I think this should be a special flag to ktrace which says "Report all capabilities violation".
I would also consider doing something else, instead or maybe in addition to extending ktrace, I would consider adding an lddb plugin that allows pointing from which point this kind of violation should be reported.
Or if the lldb is to complicated maybe we can add a special syscall cap_enter_report_only or something like that.
What do you mean by "normal user"? To get these events, you have to explicitly ask ktrace to tell you about capability mode violations.
The application that wasn't supposed to be sandboxed will also get these errors.
I think this should be a special flag to ktrace which says "Report all capabilities violation".
It already is. You have to specify ktrace -t p. The "p" flag is "trace capability check failures".
I would also consider doing something else, instead or maybe in addition to extending ktrace, I would consider adding an lddb plugin that allows pointing from which point this kind of violation should be reported.
Or if the lldb is to complicated maybe we can add a special syscall cap_enter_report_only or something like that.
Ah, ok I thought it was printed by default.
Then I don't think I have any complaints through the idea.
Are these events exposed to DTrace? When sandboxing, the thing I really want is a stack trace in userspace at the point where the violation happened. If so, it would be great to include a script that logged them. Ideally with an option of an explicit start marker so you can put in a fake cap_enter and be told what you still need to fix.
It's doable in principle, but in practice dtrace's inability to resolve backtraces in the face of fork/exec makes it mostly unusable. For instance, dtrace -n 'fbt::ktrcapfail:entry /progenyof($pid)/{ustack();}' -c "ktrace -t p ls" works up until ktrace execs the target process. This is really a limitation of dtrace and doesn't have anything to do with ktrace.
It works somewhat better if you're experimenting with a daemon that you can attach to. For instance, below, pid 3641 is syslogd:
# ktrace -t p -p 3641 # sleep 2 && kill -HUP 3641 & # dtrace -n 'fbt::ktrcapfail:entry /pid == $target/{ustack();}' -p 3641 CPU ID FUNCTION:NAME 7 50830 ktrcapfail:entry libc.so.7`_open+0xa libc.so.7`0x9726e899fcf libc.so.7`0x9726e8990ff libc.so.7`tzset+0x36 syslogd`init+0xf6 syslogd`main+0xeb1 libc.so.7`__libc_start1+0x12a syslogd`_start+0x2d `0xf98aa403008 7 50830 ktrcapfail:entry libc.so.7`_open+0xa syslogd`init+0x1fa syslogd`main+0xeb1 libc.so.7`__libc_start1+0x12a syslogd`_start+0x2d `0xf98aa403008 7 50830 ktrcapfail:entry libc.so.7`_openat+0xa syslogd`cfline+0x8fd syslogd`parseconfigfile+0x602 syslogd`init+0x20f syslogd`main+0xeb1 libc.so.7`__libc_start1+0x12a syslogd`_start+0x2d `0xf98aa403008 ...
so you can get some sense for what's going on.
All this aside, I'd argue that having the "capfail" tracepoints identified and enumerated (in this case by ktrace) is useful in its own right. You could for instance use ktrace to get a full trace of an application's behaviour, then subsequently use gdb or dtrace or whatever to drill down into specific system calls.
It's doable in principle, but in practice dtrace's inability to resolve backtraces in the face of fork/exec makes it mostly unusable
I think around 20% of the places I've used Capsicum have done fork/execve. This would have been a huge win for the most recent thing (which used the Tesseract OCR libraries in a Capsized process and spent a while adding the compat syscall wrappers to use in cap mode). Learning that libc calls open via _open because it hates me and everything I stand for and that libomp uses __sys_shm_open2 via the shm_open wrapper took a while to discover and would have taken a minute or two with that DTrace script.
I created this patch to make the Capsicumization experience less intimidating for inexperienced developers. Both David and Mariusz may not be the target audience for this change because they already know how to extract the information that the tracing provides. Developers that are unfamiliar with Capsicum's semantics could use this tracing mode to easily determine why their program is not working in capability mode. I think it provides a solid starting point so new developers don't get lost and discouraged.
The barrier to entry for Capsicum development is high and this tracing flag is an extra tool in a new developer's toolbox that will ease them into Capsicum. It really can only help.
I created this patch to make the Capsicumization experience less intimidating for inexperienced developers. Both David and Mariusz may not be the target audience for this change
Oh, I am definitely in the target audience for this, thank you for working on it! If I’d had this and the DTrace script a few weeks ago, it would have saved me a few hours of work.
Hello Jake,
I have raised my concerns about the approach you have taken. I asked questions to understand it better and suggested some potential fixes. Mark has proved me wrong, and in the end, I supported the change. This is how the peer review works. You have been working on this for weeks, or maybe months, and I was trying to understand your approach to the issue. :)
I'm not sure who else should do this review, if not people who have been working on Capsicum and Capsicumizing applications.
Besides nontechnical issues, in my opinion, DTrace is one of the ways to accomplish that. However, I don't see a reason not to extend a ktrace output of additional information. In the end, we can have multiple tools for different levels of experience or even, especially since DTrace is another complicated tool one must learn.
sys/sys/ktrace.h | ||
---|---|---|
226 |
Other macros in this file depend on <sys/param>, so if we are trying to make headers independent, then we already have to include it. |
After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?
Eg:
<11:53am>beast/gallatin:~>ktrace ls -1 | wc -l
319
<11:55am>beast/gallatin:~>kdump | grep CAP
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: readlink
22235 ls CAP fstatat: restricted VFS lookup: AT_FDCWD
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: fchdir
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: fchdir
After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled
Are systems without Capsicum still supported? I thought that option was removed in 14.