Page MenuHomeFreeBSD

time(3): Optimize tvtohz() function.
ClosedPublic

Authored by hselasky on Oct 3 2022, 9:19 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 15, 2:12 PM
Unknown Object (File)
Fri, Nov 15, 7:26 AM
Unknown Object (File)
Fri, Nov 15, 6:57 AM
Unknown Object (File)
Fri, Nov 1, 11:07 PM
Unknown Object (File)
Fri, Oct 25, 4:48 PM
Unknown Object (File)
Oct 18 2024, 10:39 PM
Unknown Object (File)
Oct 16 2024, 10:45 AM
Unknown Object (File)
Oct 16 2024, 10:45 AM
Subscribers

Details

Summary

List of changes:

  • Use integer multiplication instead of long multiplication.
  • Remove multiple if-statements and predict new if-statements.
  • Rename local variable name, "ticks" into "retval" to avoid shadowing

the system "ticks" global variable.

MFC after: 1 week
Sponsored by: NVIDIA Networking

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

hselasky retitled this revision from Optimize and improve the tvtohz() function by using bsd_stime_scale_floor(). to time(3): Optimize and improve the tvtohz() function by using bsd_stime_scale_floor()..
hselasky updated this revision to Diff 111342.
hselasky retitled this revision from time(3): Optimize and improve the tvtohz() function by using bsd_stime_scale_floor(). to time(3): Optimize and improve the tvtohz() function using __stime64_scale32_floor()..
hselasky edited the summary of this revision. (Show Details)
hselasky added a reviewer: kib.
hselasky retitled this revision from time(3): Optimize and improve the tvtohz() function using __stime64_scale32_floor(). to time(3): Optimize tvtohz() function..
hselasky edited the summary of this revision. (Show Details)

Spelling fix.

sys/kern/kern_clock.c
568

LL is not needed due to the cast on the left side of mul
Second cast to int64 is indeed required

But, isn't it more wise to have long variants of your stime* functions, and use long there (basically you assume that int64 is same or higher rank than long, which is true for our arches right now, but who knows)

568

The double-cast (time_t->int32->int64) together with earlier checks for INT_MAX assumes that int32 == int, which is true now but not necessary so in the future. Why do you need it at all?

Why casting hz at all?

Also, the summary claims that you switched the code to use 32bit mul, while in fact you force 64bit multiplication (unless compiler is too smart, which I doubt).

hselasky added inline comments.
sys/kern/kern_clock.c
568

I don't want to use long, because the limit is then a variable, LONG_MAX / hz, which results in a division. I found it better to use comparison with INT_MAX .

568

You are right. I don't need to cast "hz". I'll update the patch.

Both clang and gcc know the (int64_t)(int32_t) thing:

volatile int a,b;

 a = 1 << 24;
 b = 1 << 24;

 printf("0x%llx\n", (int64_t)a * (int64_t)b);
 printf("0x%llx\n", (int64_t)a * b);

Results in a single imull instruction when using -m32 or native 32-bit compilation!

      volatile int a,b;

      a = 1 << 24;
401dc6:       c7 45 f8 00 00 00 01    movl   $0x1000000,-0x8(%ebp)
      b = 1 << 24;
401dcd:       c7 45 fc 00 00 00 01    movl   $0x1000000,-0x4(%ebp)

      printf("0x%llx\n", (int64_t)a * (int64_t)b);
401dd4:       8b 45 f8                mov    -0x8(%ebp),%eax
401dd7:       f7 6d fc                imull  -0x4(%ebp)
401dda:       52                      push   %edx
401ddb:       50                      push   %eax
401ddc:       68 fd 03 40 00          push   $0x4003fd
401de1:       e8 ba 00 00 00          call   401ea0 <printf@plt>
hselasky marked 2 inline comments as done.

Remove not needed cast.

Timeouts above INT_MAX seconds are not of interest, even if time_t changes in the future, ticks is still an integer.

sys/kern/kern_clock.c
567

This still assumes that int == int32_t. It can be larger.

sys/kern/kern_clock.c
567

Yes, it can be larger and smaller, but not for the platforms we support?

I can add a _Static_assert(sizeof(int) == sizeof(int32_t), "Assumption about integer types not met.");

sys/kern/kern_clock.c
567

But then why do you need casts at all?
If you so rely on compiler I do not see how casts change anything. I compiled the following program both with clang 15.0.2 and gcc 12.2.0 on amd64, and both compiler generated the imull instruction.

#include <limits.h>

extern int hz;

int
tvtohz(long xx)
{
	if (xx > INT_MAX)
		return (INT_MAX);
	if (xx < 0)
		return (1);
	return (xx * hz);
}
sys/kern/kern_clock.c
567

In fact, I think that uint64_t cast is not needed as well. The holistic test for tv_sec would be not tv_sec > INT_MAX, but tv_sec > INT_MAX / hz. The INT_MAX / hz value can be precomputed once in the place which sets hz, to avoid divide in each tvtohz() invocation.

Precomputing "INT_MAX / hz" sounds like a good idea.

I'll have a go at it. Will simplify the code a-lot!

hselasky edited the summary of this revision. (Show Details)

Implement integer-only approach as suggested by kib@ .

Use a compile-time assert instead of run-time assert.

kern.hz_max/min addition should be committed separately

sys/kern/kern_clock.c
569

This changes the visible behavior. Previous code allowed negative tv_usecs (although not quite correctly).

587

This would break compilation on arches where one of the types is not long. Use explicit cast for tv_sec/tv_usec to long with %ld format.

sys/kern/kern_clock.c
596

Move comment right above macro definition

597

Outer int cast is not needed

602

How can this work, hz is runtime value, _Static_assert is compile-time construct.

sys/kern/kern_clock.c
602

The formula is linear. Consider it more a sanity check.

sys/kern/kern_clock.c
602

I was confused until I realized hz here != extern int hz. It's the arg to the macro, so this just makes sure we have a good range... Maybe change it? Or have a better name than TIME_ASSERT for the macro.

IMHO, it should be out the function to reinforce that it's not code that executes.

mav accepted this revision.EditedOct 22 2022, 12:34 AM

I agree with Konstatin's comments, otherwise looks good to me. Thinking about the HZ_MAXIMUM value I wonder where is " >> 6" coming from? Is it useful to allow hz > 100KHz? But then above ~15KHz " >> 6" will loose precision. If HZ_MAXIMUM is set to ~68K, then " >> 5" could be better here.

This revision is now accepted and ready to land.Oct 22 2022, 12:34 AM

@mav :

I'll fix the precision issue. Using >> 6 is the best.

hselasky marked 6 inline comments as done.

Address comments from @kib and @mav .

This function should now give 100% reliable results at the expense of one more regular integer division and multiplication.

This revision now requires review to proceed.Oct 22 2022, 9:23 AM
NOTE: The compiler will convert division by 1 << n, into regular shift instructions.
sys/kern/kern_clock.c
567

New logic does not make assumptions about size of "int".

This revision is now accepted and ready to land.Oct 22 2022, 5:48 PM
kib added inline comments.
sys/kern/kern_clock.c
561

There should be a space after comma, for all macro definitions.

This revision was automatically updated to reflect the committed changes.