Paths

Table of Contentst

sound: Remove macro magic from pcm/feeder_eq.c
Needs ReviewPublic
Actions

Authored by christos on Wed, Dec 11, 4:49 PM.

Details

Reviewers

markj
emaste
dev_submerge.ch

Summary

Turn the FEEDEQ_DECLARE macro into a single inline function
(feed_eq_biquad()). There is no reason to have this as a macro, and it
only complicates the code. An advantage of this patch is that, because
we no longer call the functions created by the macro through function
pointers (biquad_op), we can call feed_eq_biquad() directly in
feed_eq_feed().

Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 61088
Build 57972: arc lint + arc unit

Event Timeline

christos created this revision.Wed, Dec 11, 4:49 PM

Herald added a subscriber: imp. · View Herald TranscriptWed, Dec 11, 4:49 PM

christos requested review of this revision.Wed, Dec 11, 4:49 PM

Harbormaster completed remote builds in B61088: Diff 147820.Wed, Dec 11, 4:49 PM

christos added a child revision: D48033: sound: Remove FEEDEQ_CLAMP().Wed, Dec 11, 4:49 PM

If you want to make your function equal to the macro in terms of performance, you have to call it with literal format parameters somehow. E.g. through a switch on the format parameters.

dev_submerge.ch mentioned this in D48035: sound: Remove macro magic from pcm/feeder_matrix.c.Fri, Dec 13, 3:01 PM

In D48032#1095744, @dev_submerge.ch wrote:

If you want to make your function equal to the macro in terms of performance, you have to call it with literal format parameters somehow. E.g. through a switch on the format parameters.

I do not have a benchmark right now to prove the opposite, but do you think such a performance hit is noticeable, if at all existent? Currently we use a function pointer to a specialized function (i.e one for each format), and the patch uses a generic function which fetches the format directly from the struct. I am not really well-versed with compiler optimizations, but I would suppose that the generic one is more likely to be inlined, and thus give us a performance boost (I did see the comment about CPU-branch prediction in D47932).

The main rationale behind this patch, as well as the other similar ones, is to make the code cleaner and easier to work with. There had been a few times already where I tried to make some changes to these files (especially with AFMT_FLOAT support) and the macros make it quite tedious. Even though D47932 fixes most of what I wanted, I still think it's good to have clean code everywhere.

In D48032#1096268, @christos wrote:

In D48032#1095744, @dev_submerge.ch wrote:

If you want to make your function equal to the macro in terms of performance, you have to call it with literal format parameters somehow. E.g. through a switch on the format parameters.

I do not have a benchmark right now to prove the opposite, but do you think such a performance hit is noticeable, if at all existent? Currently we use a function pointer to a specialized function (i.e one for each format), and the patch uses a generic function which fetches the format directly from the struct. I am not really well-versed with compiler optimizations, but I would suppose that the generic one is more likely to be inlined, and thus give us a performance boost (I did see the comment about CPU-branch prediction in D47932).

The function pointer prevents inlining, alright. But inlining your generic function doesn't help to optimize the inner loops in feed_eq_biquad() that dominate here. The macro did that.

The main rationale behind this patch, as well as the other similar ones, is to make the code cleaner and easier to work with. There had been a few times already where I tried to make some changes to these files (especially with AFMT_FLOAT support) and the macros make it quite tedious. Even though D47932 fixes most of what I wanted, I still think it's good to have clean code everywhere.

I'm not telling you to go back to macros. All you need is a switch on the format parameter, in a dispatch function between the caller and feed_eq_biquad(). Something like:

switch(info->fmt) {
/* Cases you want to optimize for. */
case AFMT_S16_NE:
        feed_eq_biquad(..., ..., ..., AFMT_S16_NE);
        break;
case AFMT_S32_NE:
        feed_eq_biquad(..., ..., ..., AFMT_S32_NE);
        break;
/* Generic fallback, less optimized. */
default:
        feed_eq_biquad(..., ..., ..., info->fmt);
        break;
}

This will let the compiler inline with fixed parameters, transitively for the pcm_sample_read() and pcm_sample_write(). Whether it's critical to do so is hard to tell without a benchmark, but with this you shouldn't get performance regressions compared to the macros.

Address Florian's comment.

Harbormaster completed remote builds in B61267: Diff 148254.Thu, Dec 19, 6:11 PM

Bump.

In D48032#1100138, @christos wrote:

Bump.

The whole stack around these feeder changes depends on D47932, which is still broken. You may want to fix that one first.

In D48032#1096297, @dev_submerge.ch wrote:
In D48032#1096268, @christos wrote:

In D48032#1095744, @dev_submerge.ch wrote:

If you want to make your function equal to the macro in terms of performance, you have to call it with literal format parameters somehow. E.g. through a switch on the format parameters.

I do not have a benchmark right now to prove the opposite, but do you think such a performance hit is noticeable, if at all existent? Currently we use a function pointer to a specialized function (i.e one for each format), and the patch uses a generic function which fetches the format directly from the struct. I am not really well-versed with compiler optimizations, but I would suppose that the generic one is more likely to be inlined, and thus give us a performance boost (I did see the comment about CPU-branch prediction in D47932).

The function pointer prevents inlining, alright. But inlining your generic function doesn't help to optimize the inner loops in feed_eq_biquad() that dominate here. The macro did that.

The main rationale behind this patch, as well as the other similar ones, is to make the code cleaner and easier to work with. There had been a few times already where I tried to make some changes to these files (especially with AFMT_FLOAT support) and the macros make it quite tedious. Even though D47932 fixes most of what I wanted, I still think it's good to have clean code everywhere.

I'm not telling you to go back to macros. All you need is a switch on the format parameter, in a dispatch function between the caller and feed_eq_biquad(). Something like:
switch(info->fmt) {
/* Cases you want to optimize for. */
case AFMT_S16_NE:
        feed_eq_biquad(..., ..., ..., AFMT_S16_NE);
        break;
case AFMT_S32_NE:
        feed_eq_biquad(..., ..., ..., AFMT_S32_NE);
        break;
/* Generic fallback, less optimized. */
default:
        feed_eq_biquad(..., ..., ..., info->fmt);
        break;
}

I suspect that to make sure this works as expected, you would also want to introduce functions like

static void __noinline
feed_eq_biquad_s16(...)
{
    feed_eq_biquad(..., AFMT_S16_NE);
}

and call those from the switch statement instead. That would make the programmer's intent more obvious, and prevent the compiler from trying to deduplicate the inlined code.

This will let the compiler inline with fixed parameters, transitively for the pcm_sample_read() and pcm_sample_write(). Whether it's critical to do so is hard to tell without a benchmark, but with this you shouldn't get performance regressions compared to the macros.

How frequently do these functions get called?

sys/dev/sound/pcm/feeder_eq.c
135	When parameterizing a function with a constant to get better code generation, it's a good idea to declar that parameter as `const`, so `const uint32_t fmt` here.

In D48032#1100281, @markj wrote:
I suspect that to make sure this works as expected, you would also want to introduce functions like
static void __noinline
feed_eq_biquad_s16(...)
{
    feed_eq_biquad(..., AFMT_S16_NE);
}
and call those from the switch statement instead. That would make the programmer's intent more obvious, and prevent the compiler from trying to deduplicate the inlined code.

My experience from other projects is different, C++ in particular where these patterns are seen quite often using templates. But we're certainly making assumptions here about the compiler's optimization process. I'd want to have a look at the assembler generated to evaluate this.

This will let the compiler inline with fixed parameters, transitively for the pcm_sample_read() and pcm_sample_write(). Whether it's critical to do so is hard to tell without a benchmark, but with this you shouldn't get performance regressions compared to the macros.

How frequently do these functions get called?

The pcm_sample_read() and pcm_sample_write() get called for every sample processed. Higher level feeder functions get called on small buffer portions, e.g. ~100 frames, AFAIK.

Mark fmt argument as const.

Harbormaster completed remote builds in B61567: Diff 148914.Tue, Jan 7, 8:54 PM

Revision Contents
Changeset List

Path

Size

sys/

dev/

sound/

pcm/

feeder_eq.c

213 lines

Diff 147820

View Options

sys/dev/sound/pcm/feeder_eq.c

sound: Remove macro magic from pcm/feeder_eq.cNeeds ReviewPublicActions