Page MenuHomeFreeBSD

Switch to using drive-supplied timeouts for the sa(4) driver.
ClosedPublic

Authored by ken on Jan 13 2022, 9:17 PM.
Tags
None
Referenced Files
F102602038: D33883.id101426.diff
Thu, Nov 14, 3:49 PM
F102602026: D33883.id101426.diff
Thu, Nov 14, 3:49 PM
Unknown Object (File)
Sun, Nov 10, 6:16 AM
Unknown Object (File)
Sat, Nov 9, 4:34 PM
Unknown Object (File)
Thu, Nov 7, 11:44 PM
Unknown Object (File)
Thu, Nov 7, 4:37 PM
Unknown Object (File)
Thu, Nov 7, 11:10 AM
Unknown Object (File)
Wed, Nov 6, 8:12 PM

Details

Summary

The sa(4) driver has historically used tape drive timeouts that
were one-size fits all, with compile-time options to adjust a few
of them.

LTO-9 drives (and presumably other tape drives in the future)
implement a tape characterization process that happens the first
time a tape is loaded. The characterization process formats the
tape to account for the temperature and humidity in the environment
it is being used in. The process for LTO-9 tapes can take from 20
minutes (I have observed 17-18 minutes) to 2 hours according to the
documentation.

As a result, LTO-9 drives have significantly longer recommended
load times than previous LTO generations.

To handle this, change the sa(4) driver over to using timeouts
supplied by the tape drive using the timeout descriptors obtained
through the REPORT SUPPORTED OPERATION CODES command. That command
was introduced in SPC-4. IBM tape drives going back to at least
LTO-5 report timeout values. Oracle/Sun/StorageTek tape drives
going back to at least the T10000C report timeout values. HP LTO-5
and newer drives report timeout values. The sa(4) driver only
queries drives that claim to support SPC-4.

This makes the timeout settings automatic and accurate for newer
tape drives.

Also, add loader tunable and sysctl support so that the user can
override individual command type timeouts for all tape drives in
the system, or only for specific drives.

The new global (these affect all tape drives) loader tunables are:

kern.cam.sa.timeout.erase
kern.cam.sa.timeout.load
kern.cam.sa.timeout.locate
kern.cam.sa.timeout.mode_select
kern.cam.sa.timeout.mode_sense
kern.cam.sa.timeout.prevent
kern.cam.sa.timeout.read
kern.cam.sa.timeout.read_position
kern.cam.sa.timeout.read_block_limits
kern.cam.sa.timeout.report_density
kern.cam.sa.timeout.reserve
kern.cam.sa.timeout.rewind
kern.cam.sa.timeout.space
kern.cam.sa.timeout.tur
kern.cam.sa.timeout.write
kern.cam.sa.timeout.write_filemarks

The new per-instance loader tunable / sysctl variables are:

kern.cam.sa.%d.timeout.erase
kern.cam.sa.%d.timeout.load
kern.cam.sa.%d.timeout.locate
kern.cam.sa.%d.timeout.mode_select
kern.cam.sa.%d.timeout.mode_sense
kern.cam.sa.%d.timeout.prevent
kern.cam.sa.%d.timeout.read
kern.cam.sa.%d.timeout.read_position
kern.cam.sa.%d.timeout.read_block_limits
kern.cam.sa.%d.timeout.report_density
kern.cam.sa.%d.timeout.reserve
kern.cam.sa.%d.timeout.rewind
kern.cam.sa.%d.timeout.space
kern.cam.sa.%d.timeout.tur
kern.cam.sa.%d.timeout.write
kern.cam.sa.%d.timeout.write_filemarks

The values are reported and set in units of thousandths of a
second.

share/man/man4/sa.4:
Document the new loader tunables in the sa(4) man page.

sys/cam/scsi/scsi_sa.c:
Add a new timeout_info array to the softc.

Add a default timeouts array, along with descriptions.

Add a new sysctl tree to the softc to handle the timeout
sysctl values.

Add a new function, saloadtotunables(), that will load
the global loader tunables first and then any per-instance
loader tunables second.

Add creation of the new timeout sysctl variables in
sasysctlinit().

Add a new, optional probe state to the sa(4) driver. We
previously didn't do any probing, but now we probe for
timeout descriptors if the drive claims to support SPC-4 or
later. In saregister(), we check the SCSI revision and
either launch the probe state machine, or announce the
device and become ready.

In sastart() and sadone(), add support for the new
SA_STATE_PROBE. If we're probing, we don't go through
saerror(), since that is currently only written to handle
I/O errors in the normal state.

Change every place in the sa(4) driver that fills in
timeout values in a CCB to use the new timeout_info[] array
in the softc.

Add a new saloadtimeouts() routine to parse the returned
timeout descriptors from a completed REPORT SUPPORTED
OPERATION CODES command, and set the values for the
commands we support.

MFC after: 1 week
Sponsored by: Spectra Logic

Test Plan

Try this out with a variety of tape drives and make sure the timeouts that
result (sysctl kern.cam.sa to see them) are reasonable.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 43925
Build 40813: arc lint + arc unit

Event Timeline

ken requested review of this revision.Jan 13 2022, 9:17 PM

I don't know about how these timeouts work. It seems to be good, though, apart from the sysctl thing.

sys/cam/scsi/scsi_sa.c
2413

If you add CTLFLAG_TUN here, you don't need saloadtotunables

sys/cam/scsi/scsi_sa.c
2413

Yes, that would accomplish loading the per-instance (sa0, sa1, etc.) tunables, but not the global tunables, and would introduce a race.

The idea is we have the following in order of increasing priority:

  1. Driver default timeouts.
  2. Timeouts loaded from the drive via REPORT SUPPORTED OPERATION CODES.
  3. Global loader tunables, used for all sa(4) driver instances on a machine.
  4. Instance-specific loader tunables, used for say sa5.
  5. On the fly user sysctl changes.

Each step will overwrite the timeout value set from the one before, so you go from general to most specific.

The problem is that the sysctl variables are created from a thread, and the per-instance tunable values would be loaded whenever the thread runs. That could be before or after we get the timeouts from the drive, and so we would have undefined results -- we could wind up with the user's preference or the drive setting depending on which one runs first. So I intentionally didn't load the per-instance tunables via the sysctl flag, so that I could control the ordering.

Other that that clarification request, the spelling, grammar, and contents LGTM. Others will have to check consistency between code and manual page.

share/man/man4/sa.4
356

Clarification request (I can't read code well enough to tell): if a tape drive appears after boot (USB, camcontrol rescan, whatever) and barring any action in devd.conf or wherever, the manual page should explicitly state whether these tunables will override its default timeouts as well.

Address documentation feedback, clarify that loader tunables will still be
used for tape drives that arrive after boot.

ken marked an inline comment as done.Jan 14 2022, 4:13 PM
ken added inline comments.
share/man/man4/sa.4
356

Done, I added another sentence explaining that the tunables are used even when a drive arrives after boot.

I have a little bit of mdoc(7) feedback.

share/man/man4/sa.4
332–333

I'd recommend using the St macro for referencing standards.

Also note that the sentence will need to be reflowed.

341

Is there a reason why THOUSANDTHS is capitalized here?

ken marked an inline comment as done.Jan 14 2022, 9:52 PM
ken added inline comments.
share/man/man4/sa.4
341

Is there a reason why THOUSANDTHS is capitalized here?

For emphasis. The usual expectation for timeouts like this is that they would be specified in seconds. Sometimes you see things in milliseconds (hundredths of a second) but thousandths of a second is a bit unusual. The reason is because that is how they're used inside the kernel, and this allows just setting the values directly instead of having to validate and multiply the sysctl or tunable input.

If you think it would be better to do it in bold or something, that is certainly possible.

imp added inline comments.
sys/cam/scsi/scsi_sa.c
2413

OK. Please add this comment as a comment in the code because it's not at all obvious that this race exists

This revision is now accepted and ready to land.Jan 14 2022, 9:55 PM

Other than that markup nit, LGTM now.

share/man/man4/sa.4
341

If you think it would be better to do it in bold or something, that is certainly possible.

I think .Em would be appropriate here per mdoc(7).

I'm gonna go ahead and approve this, and leave the choice of macro up to you prior to when you commit and push. :)

share/man/man4/sa.4
341

If you think it would be better to do it in bold or something, that is certainly possible.

As Pau Amma points out, mdoc(7) is the canonical reference for the syntax used.

It's up to you if you pick .Em or .Sy, as either is used for emphasis - although I'd say that .Em is more often used for accentuation whereas .Sy seems to be intended for indicating significance.

So the question is, do you prefer italics/underlining depending on how it's viewed or if you always want it bolded. Since you mention bolding, I'd suggest using .Sy.

341

I think .Em would be appropriate here per mdoc(7).

There's no clear style guide for it, and a quick use of find /usr/src/share/man/ -type f -exec grep (.Em|.Sy) {} + | wc -l shows there's not a whole lot of difference in the number of uses.

I think it comes down to how one wants it rendered.

share/man/man4/sa.4
332–333

I'd recommend using the St macro for referencing standards.

Also note that the sentence will need to be reflowed.

It looks like the St macro only works properly if the standard is already defined (somewhere in the mdoc macros), and is used to expand an abbreviation to the full name. That isn't quite what we want here (I think we need both), and the suggested format above yields this:

"For newer tape drives that claim to support the standard (SCSI Primary Commands 4) or later standards, the"

Note that SPC-4 doesn't appear before standard, and as a result, the text isn't quite right.

Update sa(4) comments and man page after review.

sys/cam/scsi/scsi_sa.c:
Add comments explaining the priority order of the various
sources of timeout values. Also, explain that the probe
that pulls in drive recommended timeouts via the REPORT
SUPPORTED OPERATION CODES command is in a race with the
thread that creates the sysctl variables. Because of that
race, it is important that the sysctl thread not load any
timeout values from the kernel environment.

share/man/man4/sa.4:
Use the Sy macro to emphasize thousandths of a second
instead of capitalizing it.

This revision now requires review to proceed.Jan 18 2022, 3:50 PM

Manual page looks good.

This revision is now accepted and ready to land.Jan 18 2022, 5:46 PM