Page MenuHomeFreeBSD

time(3): Increase precision of sbt and bintime conversion functions by using gcd.
ClosedPublic

Authored by hselasky on Oct 2 2022, 10:23 PM.
Tags
None
Referenced Files
F108441755: D36857.id111328.diff
Fri, Jan 24, 7:26 PM
F108441747: D36857.id111345.diff
Fri, Jan 24, 7:26 PM
Unknown Object (File)
Fri, Jan 24, 5:29 PM
Unknown Object (File)
Sun, Jan 19, 5:08 AM
Unknown Object (File)
Sun, Jan 19, 4:47 AM
Unknown Object (File)
Sun, Jan 19, 3:53 AM
Unknown Object (File)
Sat, Jan 18, 9:15 PM
Unknown Object (File)
Tue, Jan 7, 3:47 AM

Details

Summary

When converting times to and from units which have many leading zeros,
it pays off to compute the greatest common divisor first, and then do the
scaling product. This way all time unit conversion comes down to scaling a
signed or unsigned 64-bit value by a fraction represented by two signed
or unsigned 32-bit integers.

For example:

SBT_1S is defined as 2^32 . When scaling using powers of 10 above 1,
the gcd of SBT_1S and 10^N is always greater than or equal to 4,
when N is greater or equal to 2.

For example scaling a sbt value to milliseconds is done by multiplying
by (1000 / 8) and dividing by (2^32 / 8).

This trick allows for higher sbt precision at very little additional CPU cost.

MFC after: 1 week
Sponsored by: NVIDIA Networking

Test Plan
#include <stdio.h>
#include "/usr/src/sys/sys/time.h"

int
main()
{
	for (int64_t x = 0; x != 2000000000; x++) {
		int64_t y = nstosbt(x);
		int64_t z = sbttons(y);

		if (z != x)
			printf("NS ERROR %zd %zd\n", z, x);
	}

	for (int64_t x = 0; x != 2000000; x++) {
		int64_t y = ustosbt(x);
		int64_t z = sbttous(y);

		if (z != x)
			printf("US ERROR %zd %zd\n", z, x);
	}

	for (int64_t x = 0; x != 2000; x++) {
		int64_t y = mstosbt(x);
		int64_t z = sbttoms(y);

		if (z != x)
			printf("MS ERROR %zd %zd\n", z, x);
	}

	return (0);
}

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

hselasky created this revision.
sys/sys/time.h
169

I am not sure what does the definition mean. I suspect it is not GCD, but the largest power of 2 dividing both of the arguments, am I right?

I also needed the following hint from the hacker's delight to decipher the calculation. I believe it must be included in the description.

create a word with 1’s at the positions of the trailing 0’s in x, and 0’s elsewhere, producing 0 if none (e.g., 01011000 ->0000 0111):
~x & x – 1
172

Why all these macros cannot be static inline functions?
I have to look up the signatures of the used functions to understand the rules.

Also, I do not think we should put symbols in the user namespace, that are not supposed to be accessed by users. Please use implementation namespace: __xxxx, i.e. two underscores in front of alphabetical characters.

Implement suggestions from @kib

  • use __ to prefix internal function names.
  • add more detailed explanation.
  • use static inline functions when possible.
x=125 NEW: 536870912 0  OLD: 536870913 0 DIFF: -1 sbt 0 ms
x=250 NEW: 1073741824 0  OLD: 1073741825 0 DIFF: -1 sbt 0 ms
x=375 NEW: 1610612736 0  OLD: 1610612737 0 DIFF: -1 sbt 0 ms
x=500 NEW: 2147483648 0  OLD: 2147483649 0 DIFF: -1 sbt 0 ms
x=625 NEW: 2684354560 0  OLD: 2684354561 0 DIFF: -1 sbt 0 ms
x=750 NEW: 3221225472 0  OLD: 3221225473 0 DIFF: -1 sbt 0 ms
x=875 NEW: 3758096384 0  OLD: 3758096385 0 DIFF: -1 sbt 0 ms
x=1125 NEW: 4831838208 0  OLD: 4831838209 0 DIFF: -1 sbt 0 ms
x=1250 NEW: 5368709120 0  OLD: 5368709121 0 DIFF: -1 sbt 0 ms
x=1375 NEW: 5905580032 0  OLD: 5905580033 0 DIFF: -1 sbt 0 ms
x=1500 NEW: 6442450944 0  OLD: 6442450945 0 DIFF: -1 sbt 0 ms
x=1625 NEW: 6979321856 0  OLD: 6979321857 0 DIFF: -1 sbt 0 ms
x=1750 NEW: 7516192768 0  OLD: 7516192769 0 DIFF: -1 sbt 0 ms
x=1875 NEW: 8053063680 0  OLD: 8053063681 0 DIFF: -1 sbt 0 ms

Here we see that the old sbt conversions are off by-one for 125ms for example.

Looking at the hex values you see (1/8 hz) is not correctly computed as sbt:

octave:3> dec2hex(536870912)
ans = 20000000 (correct)
octave:4> dec2hex(536870913)
ans = 20000001 (wrong)

Something to consider? :

It looks to me that the original code tried to minimize the number of divisions (and modulo's). This might have been because of the likes of, say, "ARMv7-A implementation that does not include the Virtualization Extensions" having SDIV and UDIV as optional in the profile and ARMv7-R only requiring such instructions for Thumb, not ARM. (Just examples, I've no general knowledge of the range of platform's status for such.) Also: so far as I know, there is no 64-bit division instruction. The prior is about 32-bit division.

I also wonder if factor or divisor or the like is intended to ever allowed to be negative in the use of %. (Some routines have signed types involved.) Checking C's language requirements for negatives involved in % operations could be important otherwise: If there is C implementation freedom, that could lead to variability in the results.

Looks like C89 was the last with "implementation defined" for / and % involving negative integers. After that: 7/3==2; 7%3==1; -7/3==-2; -7%3==-1; 7/-3==-2; 7/-3==1; -7/-3==2; -7%-3==-1. So / truncates towards zero and a%b==a-(a/b)*b when a/b is representable. Notably -7%3!=7%-3 and 7%3!=-7%-3 .

It looks to me that the original code tried to minimize the number of divisions (and modulo's).

You are on right track, but off by a decade or two :-)

There were no 64 bit divide instruction in the i386 instruction set, and the software version in the gcc runtime were horribly slow by kernel standards.

Something to consider? :

It looks to me that the original code tried to minimize the number of divisions (and modulo's). This might have been because of the likes of, say, "ARMv7-A implementation that does not include the Virtualization Extensions" having SDIV and UDIV as optional in the profile and ARMv7-R only requiring such instructions for Thumb, not ARM. (Just examples, I've no general knowledge of the range of platform's status for such.)

Yes, but by doing a bunch of if's, which in turn kill performance, when the branch misses.

Try to compile the new code and look at the assembly.

I also wonder if factor or divisor or the like is intended to ever allowed to be negative in the use of %. (Some routines have signed types involved.) Checking C's language requirements for negatives involved in % operations could be important otherwise: If there is C implementation freedom, that could lead to variability in the results.

Looks like C89 was the last with "implementation defined" for / and % involving negative integers. After that: 7/3==2; 7%3==1; -7/3==-2; -7%3==-1; 7/-3==-2; 7/-3==1; -7/-3==2; -7%-3==-1. So / truncates towards zero and a%b==a-(a/b)*b when a/b is representable. Notably -7%3!=7%-3 and 7%3!=-7%-3 .

The divisor and modulus is always positive, but the result may be negative for signed operation, if the input value is negative, in my code.

Try to compile the new code and look at the assembly.

You mean the calls for armv7 when I compile a copy your test program (and a locally patched time.h copy as ./time.h and the #include adjusted)? :

. . .
<main+0x50> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x90> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x124> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x154> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x1f0> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x220> bl        000209b0 <__aeabi_uldivmod>
. . .

FYI: Using the system's normal time.h shows 3 such calls, not 6.

Side note: The compile also complains about the likes of:

# cc -O2 fbsd_time_conv.c
fbsd_time_conv.c:13:33: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                         ~~~         ^
                                         %lld
fbsd_time_conv.c:13:36: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                             ~~~        ^
                                             %lld
. . .

(modulo the style changes I listed)

sys/sys/time.h
169
171–177
177

... and the same changes over the functions below.

This revision is now accepted and ready to land.Oct 3 2022, 11:58 PM

Try to compile the new code and look at the assembly.

You mean the calls for armv7 when I compile a copy your test program (and a locally patched time.h copy as ./time.h and the #include adjusted)? :

. . .
<main+0x50> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x90> bl 000209b0 <__aeabi_uldivmod>
. . .
<main+0x124> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x154> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x1f0> bl        000209b0 <__aeabi_uldivmod>
. . .
<main+0x220> bl        000209b0 <__aeabi_uldivmod>
. . .

FYI: Using the system's normal time.h shows 3 such calls, not 6.

Side note: The compile also complains about the likes of:

# cc -O2 fbsd_time_conv.c
fbsd_time_conv.c:13:33: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                         ~~~         ^
                                         %lld
fbsd_time_conv.c:13:36: warning: format specifies type 'ssize_t' (aka 'int') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
                        printf("NS ERROR %zd %zd\n", z, x);
                                             ~~~        ^
                                             %lld
. . .

Yes, that looks correct. You should also notice that going to SBT is slower than going from SBT.

I checked qdivrem() in the kernel, and It is highly optimized. For small timeouts it will return zero infact, so only one full qdivrem() will be needed.

Also for millisecond and microsecond conversion, the divisor now fits into 16-bits, and apparently __qdivrem() optimizes for this!