Paths

Table of Contentst

Differential D48763

kernel: Add `%pV` to kernel's printf format string
Needs RevisionPublic
Actions

Authored by dumbbell on Jan 31 2025, 2:31 PM.

Details

Reviewers

imp

Group Reviewers

linuxkpi

Summary

The %pV conversion specification is used by linuxkpi and DRM drivers. This conversion expects a struct va_format argument. This struct has two field members:

fmt that points to another format string
va that points to a va_list.

It allows a "recursive" format string.

The format string parser checks if there is a V following a %p and recurse into kvprintf() with the struct va_format arguments if that's the case. It adds the return value to retval so that the length is correctly computed.

Otherwise it proceeds with the regular handling of %p.

This was worked on as part of the update of DRM drivers to Linux 6.7 but is not required for that update.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

dumbbell created this revision.Jan 31 2025, 2:31 PM

Herald added subscribers: olce, emaste, imp. · View Herald TranscriptJan 31 2025, 2:31 PM

It is a breaking change. I grepped freebsd-src and the only format strings that happen to have this %pV sequence are drivers that come from Linux and have a struct va_format argument (though this code may not be connected to the build).

This implementation follows the behavior of Linux and linuxkpi-based drivers don’t need any modifications. Another approach could be to swap the letters, e.g. %Vp to have V as a modifier of %p.

Yea, this is a stupid API that flies in the face of years and years of how printf works. The modifiers never follow the main format specifier.
It would break a driver than wnated to say "Memory used: %pV %pP" to signify Virtual and Physical addresses of something, though it's likely rare.
It would be better, imho, *NOT* to pollute the base FreeBSD with this junky API.
Instead, I'd create a wrapper for linux printf that re-writes format to at least convert pV to Vp and implement 'V' as a proper modifier. that would be much safer, less invasive and also allow us to have other, weird things from. Linux if we have to should some other format crazy happen in the future that's more likely to break things. Or better yet, just use %V directly... I'm quite worried doing this janky thing is going to prevent us from implementing something in the future and really don't want to see it. Though noticing it's for a var-args arg makes me even more hesitant.

Also, I'm a hard veto on the calling printf recursively, regardless of how/where we implement this. That's just dumb. It will cause all kinds of buffering problems and racing problems since we have to flush out the partial buffer now. I run into this all the time when I have multiple printfs in drivers that wind up interleaving when they aren't line by line. And the drm drivers are verbose enough this will be a problem.

imp requested changes to this revision.Jan 31 2025, 2:46 PM

imp added inline comments.

sys/kern/subr_prf.c
841	This makes no sense to me... why would func being NULL mean we pass a different arg?

This revision now requires changes to proceed.Jan 31 2025, 2:46 PM

If we start this ..

I also have an implementation for %pM (our %D) which is heavily used throughout the wireless drivers (I think we have 350-400 cases) out of which I only changed a few dominant ones.
I refrained from adding it so far.

In D48763#1112273, @imp wrote:

It would be better, imho, *NOT* to pollute the base FreeBSD with this junky API.

Instead, I'd create a wrapper for linux printf that re-writes format to at least convert pV to Vp and implement 'V' as a proper modifier. that would be much safer, less invasive and also allow us to have other, weird things from. Linux if we have to should some other format crazy happen in the future that's more likely to break things.

It's a bit more tricky unfortunately if we want something which works on device_printf() etc. It'll need a "conversion intermediate" for the fmt string for quite a few functions and macros but that's definitively doable, just tedious but can also happen incrementally.

And there's the case (as I mentioned) where %pM on Linux only takes one argument but our %D takes two (which if rewriting I would probably convert to %#D and check for the sharpflag in subr_prf.c which should probably work w/o side effects) or we really take a HUGE bunch of modifiers for %p. So there may be ways but it'll never be 100% great and a bit dodgy.

In the past I was even considering expanding them entirely in LinuxKPI so subr_prf.c would never know about these modifiers or anything from Linux but va_* again makes this really hard.

So yes I am 100% for confining this in LinuxKPI for as much as we can if we find a good way -- especially if that way will work a bit more generic.

In D48763#1112606, @bz wrote:

In D48763#1112273, @imp wrote:

It would be better, imho, *NOT* to pollute the base FreeBSD with this junky API.

+1

Instead, I'd create a wrapper for linux printf that re-writes format to at least convert pV to Vp and implement 'V' as a proper modifier. that would be much safer, less invasive and also allow us to have other, weird things from. Linux if we have to should some other format crazy happen in the future that's more likely to break things.

It's a bit more tricky unfortunately if we want something which works on device_printf() etc. It'll need a "conversion intermediate" for the fmt string for quite a few functions and macros but that's definitively doable, just tedious but can also happen incrementally.

device_printf() isn't a Linux API. Help me understand, maybe with an example, of why we'd need/want it to work there. The rewriting can be done with an inline block, so extra variables don't matter so much (I mean, it is the kernel, but I don't think this would break the bank).

And there's the case (as I mentioned) where %pM on Linux only takes one argument but our %D takes two (which if rewriting I would probably convert to %#D and check for the sharpflag in subr_prf.c which should probably work w/o side effects) or we really take a HUGE bunch of modifiers for %p. So there may be ways but it'll never be 100% great and a bit dodgy.

Yea, even just converting that to a different string would be better. %p is still used in src/sys 8400 times, and 600 times followed by a letter. %p may be deprecated for address in Linux (and this may be a good decision, but one we've not yet undertaken).

In the past I was even considering expanding them entirely in LinuxKPI so subr_prf.c would never know about these modifiers or anything from Linux but va_* again makes this really hard.

So yes I am 100% for confining this in LinuxKPI for as much as we can if we find a good way -- especially if that way will work a bit more generic.

Hence my ugly API comments. Also, we have a bunch of %px in ZFS right now which likely render differently in FreeBSD and Linux. FreeBSD it would be 0xXXXXXx which looks fun. But they are just debug.

I see several %pMs in the Linux wifi code, so your desires there are sane (well, as sane as implementing this crazy API can be). It's 412/666 instances of %p[A-Za-z] in the tree. %px is 48. And there's maybe 50 false positives for non .c files.

But we also have things like:
sys/ofed/drivers/infiniband/core/ib_iwpm_util.c: pr_debug("%s IPV4 %pI4: %u(0x%04X)\n",
which also follow this crazy pattern (but that looks to be a printk format)
%pV, %pd, %pad, %pI, %pK, %ps, %pUl, %pS (some followed by a number for length)

But none appear to be in FreeBSD native code (except false positives not in format strings). So that's good (ish), I guess, for avoiding false positives.

Also, we'd need to teach clang about this stuff. The full spec is rather extensive: https://www.kernel.org/doc/Documentation/printk-formats.txt has the details.

Which is a long way of saying "uff da! You may be right that translation is tricky." If we don't do translation, we'd need to update printf(9). I'm less hesitant about this given we have no existing code that uses %p[A-Za-z] intentionally.

And likely need to teach clang about this fine mess. Since there's no warnings about any of these 6k formats that will be 'wrong' when executed on FreeBSD.

device_printf() isn't a Linux API. Help me understand, maybe with an example, of why we'd need/want it to work there. The rewriting can be done with an inline block, so extra variables don't matter so much (I mean, it is the kernel, but I don't think this would break the bank).

Some things are directly mapped to native *printf functions but having an intermediate wrapper to deal with the format strings will just make them small (inline) functions and be good.

And there's the case (as I mentioned) where %pM on Linux only takes one argument but our %D takes two (which if rewriting I would probably convert to %#D and check for the sharpflag in subr_prf.c which should probably work w/o side effects) or we really take a HUGE bunch of modifiers for %p. So there may be ways but it'll never be 100% great and a bit dodgy.

Yea, even just converting that to a different string would be better. %p is still used in src/sys 8400 times, and 600 times followed by a letter. %p may be deprecated for address in Linux (and this may be a good decision, but one we've not yet undertaken).

In the past I was even considering expanding them entirely in LinuxKPI so subr_prf.c would never know about these modifiers or anything from Linux but va_* again makes this really hard.

So yes I am 100% for confining this in LinuxKPI for as much as we can if we find a good way -- especially if that way will work a bit more generic.

Hence my ugly API comments. Also, we have a bunch of %px in ZFS right now which likely render differently in FreeBSD and Linux. FreeBSD it would be 0xXXXXXx which looks fun. But they are just debug.

I see several %pMs in the Linux wifi code, so your desires there are sane (well, as sane as implementing this crazy API can be). It's 412/666 instances of %p[A-Za-z] in the tree. %px is 48. And there's maybe 50 false positives for non .c files.

But we also have things like:
sys/ofed/drivers/infiniband/core/ib_iwpm_util.c: pr_debug("%s IPV4 %pI4: %u(0x%04X)\n",
which also follow this crazy pattern (but that looks to be a printk format)
%pV, %pd, %pad, %pI, %pK, %ps, %pUl, %pS (some followed by a number for length)

But none appear to be in FreeBSD native code (except false positives not in format strings). So that's good (ish), I guess, for avoiding false positives.

Also, we'd need to teach clang about this stuff. The full spec is rather extensive: https://www.kernel.org/doc/Documentation/printk-formats.txt has the details.

Which is a long way of saying "uff da! You may be right that translation is tricky." If we don't do translation, we'd need to update printf(9). I'm less hesitant about this given we have no existing code that uses %p[A-Za-z] intentionally.

And likely need to teach clang about this fine mess. Since there's no warnings about any of these 6k formats that will be 'wrong' when executed on FreeBSD.

I'll ignore the compilers for a moment and make that a secondary problem ;-)

There's another option:
Don't teach kvprintf(9) about these things but actually write a 2nd implementation as wrapper we can extend just in LinuxKPI and then either pass the full resolved string w/o arguments to our native printf routines so that the output goes to the right channel. It's a bit dogdgy too but contains the problem and optimizations can always be done later. I wonder how hard it would be -- to get that right as well.

For as long as these formats really only appear in Linux(KPI) code we could also go for the route you had suggested and rewrite these %p to say (for this example) to %L and once LinuxKPI loads you get the hook in kvprintf which otherwise deals with this as it does deal with an unknown format, and sets stop. Also ugly.

dumbbell edited the summary of this revision. (Show Details)Feb 8 2025, 2:56 PM

Thank you all for the extensive and great feedback on this patch!

I didn’t address any of your comments so far because it’s not critical for the work on DRM drivers. The i915 drivers uses %pV to display various flags and properties in dmesg at the end the of the driver init. Without this, the message is a bunch of concatenated 0x... pointers (all on one line because the newline is in one of the va_format). So this is ugly but it doesn’t prevent the driver from working.

dumbbell edited the summary of this revision. (Show Details)Sun, Apr 13, 10:16 AM