Paths

Table of Contentst

ktrace: Record detailed ECAPMODE violations
ClosedPublic
Actions

Authored by jfree on Jun 20 2023, 10:03 PM.

Details

Reviewers

markj
oshogbo

Commits

rG9da71e6353a2: ktrace: Record detailed ECAPMODE violations
rG9bec84131215: ktrace: Record detailed ECAPMODE violations

Summary

ktrace: Record detailed ECAPMODE violations

When a Capsicum violation occurs in the kernel, ktrace will now record
detailed information pertaining to the violation.

For example:
- When a namei lookup violation occurs, ktrace will record the path.
- When a signal violation occurs, ktrace will record the signal number.
- When a sendto(2) violation occurs, ktrace will record the recipient
  sockaddr.

For all violations, the syscall and ABI is recorded.

kdump is also modified to display this new information to the user.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

jfree created this revision.Jun 20 2023, 10:03 PM

Herald added a subscriber: imp. · View Herald TranscriptJun 20 2023, 10:03 PM

jfree requested review of this revision.Jun 20 2023, 10:03 PM

jfree added child revisions: D40681: ktrace: Record socket violations with KTR_CAPFAIL, D40680: ktrace: Record namei violations with KTR_CAPFAIL, D40679: ktrace: Record signal violations with KTR_CAPFAIL, D40678: ktrace: Record syscall violations with KTR_CAPFAIL, D40677: ktrace: Record cpuset violations with KTR_CAPFAIL, D40682: tests: Add ktrace capability violation test cases.Jun 20 2023, 10:21 PM

Overall this looks good to me. I wonder if @emaste, @oshogbo or @theraven have any thoughts on it? To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

sys/kern/sys_capability.c
161	Missing a newline after a variable definition.
183	Missing a newline after a variable definition.
sys/sys/ktrace.h
215
220	IMO, CAPFAIL_NAMEI would be a better name for this.
222	Missing newlines between the type definitions.
229	This can be PATH_MAX instead, in which case you can pull in syslimits.h instead of the much larger param.h.
usr.bin/kdump/kdump.c
2139	Missing newline after a local variable definition.

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

So this traces the system calls that are not on the allowed-in-cap-mode list?

In D40676#958207, @theraven wrote:

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

So this traces the system calls that are not on the allowed-in-cap-mode list?

Among other things (like calling namei() with an absolute path), yes.

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

I haven't looked into the code, to be honest. However, I don't see a real application for this approach, or maybe I misread how this is supposed to work.
Is this a tool for improving debugging sandboxed applications or sandboxing new applications?
If the second, this doesn't help determine the "trusted part" and "capability part" of the application because after re-arranging the code I still will get the same errors.
If I would try to sandbox a new application why not put cap_enter at a "random" place and see how ktrace reports the same errors?

After a quick scan, I don't see any additional parameter to ktrace. Can I see a sample output of ktrace? Won't it be misleading for the user that something is happening in TCB ("trusted part") and will be reported as a violation of Capsicum?

Again, maybe I just need some more context to understand the reasoning behind this change.

Again, maybe I just need some more context to understand the reasoning behind this change.

Here is an example of ktrace CAPFAIL tracing in action, with this patch:

https://cdaemon.com/posts/capsicum#detecting-violations

In D40676#958215, @oshogbo wrote:

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

I haven't looked into the code, to be honest. However, I don't see a real application for this approach, or maybe I misread how this is supposed to work.
Is this a tool for improving debugging sandboxed applications or sandboxing new applications?

Mostly for sandboxing new applications.

If the second, this doesn't help determine the "trusted part" and "capability part" of the application because after re-arranging the code I still will get the same errors.

No, it doesn't tell you how to split out your code. The idea is that would be useful early on, when becoming familiar with an application's behaviour. Tracing records

If I would try to sandbox a new application why not put cap_enter at a "random" place and see how ktrace reports the same errors?

Because the application will (probably) quickly fail and exit or just spin its wheels somehow, because most of its system calls will fail. This new mode does not affect the behaviour of the program.

After a quick scan, I don't see any additional parameter to ktrace. Can I see a sample output of ktrace? Won't it be misleading for the user that something is happening in TCB ("trusted part") and will be reported as a violation of Capsicum?

Again, maybe I just need some more context to understand the reasoning behind this change.

If I understand correctly, for application like:

localtime();
open();
cap_enter()
openat()

The first two operations will always cause ktrace to report insufficient capabilities. Which is a false-postive statement, and will be misleading for "normal" users.
The application that wasn't supposed to be sandboxed will also get these errors.
I think this should be a special flag to ktrace which says "Report all capabilities violation".

I would also consider doing something else, instead or maybe in addition to extending ktrace, I would consider adding an lddb plugin that allows pointing from which point this kind of violation should be reported.
Or if the lldb is to complicated maybe we can add a special syscall cap_enter_report_only or something like that.

In D40676#958254, @oshogbo wrote:
If I understand correctly, for application like:
localtime();
open();
cap_enter()
openat()
The first two operations will always cause ktrace to report insufficient capabilities. Which is a false-postive statement, and will be misleading for "normal" users.

What do you mean by "normal user"? To get these events, you have to explicitly ask ktrace to tell you about capability mode violations.

The application that wasn't supposed to be sandboxed will also get these errors.
I think this should be a special flag to ktrace which says "Report all capabilities violation".

It already is. You have to specify ktrace -t p. The "p" flag is "trace capability check failures".

I would also consider doing something else, instead or maybe in addition to extending ktrace, I would consider adding an lddb plugin that allows pointing from which point this kind of violation should be reported.
Or if the lldb is to complicated maybe we can add a special syscall cap_enter_report_only or something like that.

Ah, ok I thought it was printed by default.
Then I don't think I have any complaints through the idea.

Are these events exposed to DTrace? When sandboxing, the thing I really want is a stack trace in userspace at the point where the violation happened. If so, it would be great to include a script that logged them. Ideally with an option of an explicit start marker so you can put in a fake cap_enter and be told what you still need to fix.

In D40676#958341, @theraven wrote:

Are these events exposed to DTrace? When sandboxing, the thing I really want is a stack trace in userspace at the point where the violation happened. If so, it would be great to include a script that logged them. Ideally with an option of an explicit start marker so you can put in a fake cap_enter and be told what you still need to fix.

It's doable in principle, but in practice dtrace's inability to resolve backtraces in the face of fork/exec makes it mostly unusable. For instance, dtrace -n 'fbt::ktrcapfail:entry /progenyof($pid)/{ustack();}' -c "ktrace -t p ls" works up until ktrace execs the target process. This is really a limitation of dtrace and doesn't have anything to do with ktrace.

It works somewhat better if you're experimenting with a daemon that you can attach to. For instance, below, pid 3641 is syslogd:

# ktrace -t p -p 3641
# sleep 2 && kill -HUP 3641 &
# dtrace -n 'fbt::ktrcapfail:entry /pid == $target/{ustack();}' -p 3641
CPU     ID                    FUNCTION:NAME                                                                                                                                                                                                                                                                                   
  7  50830                 ktrcapfail:entry                                                                                                                                                                                                                                                                                   
              libc.so.7`_open+0xa                                                                                                                                                                                                                                                                                             
              libc.so.7`0x9726e899fcf                                                                                                                                                                                                                                                                                         
              libc.so.7`0x9726e8990ff                                                                                                                                                                                                                                                                                         
              libc.so.7`tzset+0x36                                                                                                                                                                                                                                                                                            
              syslogd`init+0xf6                                                                                                                                                                                                                                                                                               
              syslogd`main+0xeb1                                                                                                                                                                                                                                                                                              
              libc.so.7`__libc_start1+0x12a                                                                                                                                                                                                                                                                                   
              syslogd`_start+0x2d                                                                                                                                                                                                                                                                                             
              `0xf98aa403008
  7  50830                 ktrcapfail:entry 
              libc.so.7`_open+0xa
              syslogd`init+0x1fa
              syslogd`main+0xeb1
              libc.so.7`__libc_start1+0x12a
              syslogd`_start+0x2d
              `0xf98aa403008

  7  50830                 ktrcapfail:entry 
              libc.so.7`_openat+0xa
              syslogd`cfline+0x8fd
              syslogd`parseconfigfile+0x602
              syslogd`init+0x20f
              syslogd`main+0xeb1
              libc.so.7`__libc_start1+0x12a
              syslogd`_start+0x2d
              `0xf98aa403008
...

so you can get some sense for what's going on.

All this aside, I'd argue that having the "capfail" tracepoints identified and enumerated (in this case by ktrace) is useful in its own right. You could for instance use ktrace to get a full trace of an application's behaviour, then subsequently use gdb or dtrace or whatever to drill down into specific system calls.

It's doable in principle, but in practice dtrace's inability to resolve backtraces in the face of fork/exec makes it mostly unusable

I think around 20% of the places I've used Capsicum have done fork/execve. This would have been a huge win for the most recent thing (which used the Tesseract OCR libraries in a Capsized process and spent a while adding the compat syscall wrappers to use in cap mode). Learning that libc calls open via _open because it hates me and everything I stand for and that libomp uses __sys_shm_open2 via the shm_open wrapper took a while to discover and would have taken a minute or two with that DTrace script.

I created this patch to make the Capsicumization experience less intimidating for inexperienced developers. Both David and Mariusz may not be the target audience for this change because they already know how to extract the information that the tracing provides. Developers that are unfamiliar with Capsicum's semantics could use this tracing mode to easily determine why their program is not working in capability mode. I think it provides a solid starting point so new developers don't get lost and discouraged.

The barrier to entry for Capsicum development is high and this tracing flag is an extra tool in a new developer's toolbox that will ease them into Capsicum. It really can only help.

I created this patch to make the Capsicumization experience less intimidating for inexperienced developers. Both David and Mariusz may not be the target audience for this change

Oh, I am definitely in the target audience for this, thank you for working on it! If I’d had this and the DTrace script a few weeks ago, it would have saved me a few hours of work.

Hello Jake,

I have raised my concerns about the approach you have taken. I asked questions to understand it better and suggested some potential fixes. Mark has proved me wrong, and in the end, I supported the change. This is how the peer review works. You have been working on this for weeks, or maybe months, and I was trying to understand your approach to the issue. :)

I'm not sure who else should do this review, if not people who have been working on Capsicum and Capsicumizing applications.

Besides nontechnical issues, in my opinion, DTrace is one of the ways to accomplish that. However, I don't see a reason not to extend a ktrace output of additional information. In the end, we can have multiple tools for different levels of experience or even, especially since DTrace is another complicated tool one must learn.

jfree marked 7 inline comments as done.Mar 10 2024, 4:13 AM

jfree added inline comments.

sys/sys/ktrace.h
229	This can be PATH_MAX instead, in which case you can pull in syslimits.h instead of the much larger param.h. Other macros in this file depend on <sys/param>, so if we are trying to make headers independent, then we already have to include it.

Herald added a subscriber: olce. · View Herald TranscriptMar 10 2024, 4:13 AM

Address Mark's comments
Rebase on main after several months

Harbormaster completed remote builds in B56527: Diff 135578.Mar 10 2024, 4:15 AM

jfree edited the summary of this revision. (Show Details)Mar 10 2024, 4:15 AM

markj accepted this revision.Mar 29 2024, 3:33 PM

This revision is now accepted and ready to land.Mar 29 2024, 3:33 PM

oshogbo accepted this revision.Mar 29 2024, 4:19 PM

Closed by commit rG9bec84131215: ktrace: Record detailed ECAPMODE violations (authored by jfree). · Explain WhyApr 7 2024, 11:58 PM

This revision was automatically updated to reflect the committed changes.

jfree added a commit: rG9bec84131215: ktrace: Record detailed ECAPMODE violations.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

Eg:

<11:53am>beast/gallatin:~>ktrace ls -1 | wc -l

<11:55am>beast/gallatin:~>kdump | grep CAP
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: readlink
22235 ls CAP fstatat: restricted VFS lookup: AT_FDCWD
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: fchdir
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: fchdir

In D40676#1027000, @gallatin wrote:

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

This was done already in commit f239db4800ee9e7ff8485f96b7a68e6c38178c3b.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled

Are systems without Capsicum still supported? I thought that option was removed in 14.

In D40676#1027003, @theraven wrote:

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled

Are systems without Capsicum still supported? I thought that option was removed in 14.

Userland always builds with capsicum. Kernel support is still optional.

In D40676#1027002, @markj wrote:

In D40676#1027000, @gallatin wrote:

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

This was done already in commit f239db4800ee9e7ff8485f96b7a68e6c38178c3b.