Page MenuHomeFreeBSD

ixl(4): Fix reporting of unqualified transceivers
ClosedPublic

Authored by krzysztof.galazka_intel.com on Jun 11 2021, 4:28 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 25, 7:13 PM
Unknown Object (File)
Fri, Jan 24, 7:19 PM
Unknown Object (File)
Wed, Jan 22, 1:10 AM
Unknown Object (File)
Fri, Jan 17, 5:03 PM
Unknown Object (File)
Thu, Jan 16, 8:22 AM
Unknown Object (File)
Dec 18 2024, 4:34 AM
Unknown Object (File)
Dec 9 2024, 4:57 AM
Unknown Object (File)
Dec 4 2024, 4:13 AM
Subscribers

Details

Summary

When link_active_on_if_down flag is disabled and link is turned
down with ifconfig FW reports a false positive link event
about unqualified transceiver. Condition used in driver to
filter out those false positive events was incorrect and caused
that information about unqualified module was not reported
also when the event was valid. Change the condition to relay
on IFF_UP flag instead of link_active_on_if_down and bump
driver version to 2.3.1-k.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 40643
Build 37532: arc lint + arc unit

Event Timeline

Also, in original change, https://reviews.freebsd.org/D28028, I noticed that after executing ixl_set_link(pf, false), the PHY capabilities query for an_info has I40E_AQ_QUALIFIED_MODULE unset. So, the same supported/qualified module becomes unqualified.
I think the crux of the problem is with ixl_set_link() unsetting I40E_AQ_QUALIFIED_MODULE.

sys/dev/ixl/ixl_pf_iflib.c
419–420

This change appears to suppress "unqualified module" message if the link for any reason go down.
For instance - admin down vs really an unqualified/unsupported module. If let say, I inserted a SAS cable or FC cable against the port, naturally the link is down but we will loose the ability to advertise "unqualified module" as well.

Also, in original change, https://reviews.freebsd.org/D28028, I noticed that after executing ixl_set_link(pf, false), the PHY capabilities query for an_info has I40E_AQ_QUALIFIED_MODULE unset. So, the same supported/qualified module becomes unqualified.
I think the crux of the problem is with ixl_set_link() unsetting I40E_AQ_QUALIFIED_MODULE.

There is no other way than setting phy_type for the driver to reliably disable and re-enable a link. The side effect is that when link is disabled, FW unsets I40E_AQ_QUALIFIED_MODULE flag. To avoid logging false positive message about unqualified module, we need to filter out in the driver events received from FW after interface is brought down with ifconfig.

sys/dev/ixl/ixl_pf_iflib.c
419–420

The IFF_UP flag is controlled by ifconfig and it does not depend on the state of a link reported by FW. When interface is brought up by an user and FW reports link down due to unqualified module the message is going to be reported.

Krzysztof,

So, thinking on this, my guess is that when you reboot the machine, we would be finding an "unqualified" for a qualified cable because FW see this as link down. Also, my guess, the cable will show up as unqualified when you shut the link on link-partner.

I have applied the patch and rebooted my machine and I see "unqualified" message for a good cable.
The good thing is that with this patch I donot see "unqualified" message for a admin link-down.

This revision now requires changes to proceed.Jun 16 2021, 3:02 PM

So, thinking on this, my guess is that when you reboot the machine, we would be finding an "unqualified" for a qualified cable because FW see this as link down. Also, my guess, the cable will show up as unqualified when you shut the link on link-partner.

I'm testing both scenarios with this transceiver:

plugged: SFP/SFP+/SFP28 10G Base-SR (LC)
vendor: Intel Corp PN: AFBR-703SDZ-IN2 SN: AD1432A0AY9 DATE: 2014-08-11

and I don't see the unqualified message in the dmesg

Shutting down a link on the link partner does not affect reporting by FW if module is qualified.

I have applied the patch and rebooted my machine and I see "unqualified" message for a good cable.
The good thing is that with this patch I donot see "unqualified" message for a admin link-down.

Could you, please, provide your configuration (rc.conf, loader.conf) and exact steps for reproduction?

So, thinking on this, my guess is that when you reboot the machine, we would be finding an "unqualified" for a qualified cable because FW see this as link down. Also, my guess, the cable will show up as unqualified when you shut the link on link-partner.

I'm testing both scenarios with this transceiver:

plugged: SFP/SFP+/SFP28 10G Base-SR (LC)
vendor: Intel Corp PN: AFBR-703SDZ-IN2 SN: AD1432A0AY9 DATE: 2014-08-11

and I don't see the unqualified message in the dmesg

Shutting down a link on the link partner does not affect reporting by FW if module is qualified.

Yes, Krzysztof, for immediate partner-link-down event, I donot see the issue too. But upon overnight link-toggle tests and I do see "unqualified" message.

I have applied the patch and rebooted my machine and I see "unqualified" message for a good cable.
The good thing is that with this patch I donot see "unqualified" message for a admin link-down.

Could you, please, provide your configuration (rc.conf, loader.conf) and exact steps for reproduction?

I have connected the cable back-to-back between two servers and rebooted them both at a time. During boot, on both the nodes, immediately after the driver is attached, receives link-event and notices MEDIA_AVAILABLE + IFF_UP + UNQUALIFIED + NO_LINK_UP.

My cable is
plugged: QSFP+ 40GBASE-CR4 (No separable connector)
vendor: Molex Inc. PN: 112-00322 SN: 524720492

And nodes are NetApp platforms. My code base is not HoL. I just patched this change into my code-base to give it a try.

Again Krzysztof, I presume your tests have enabled IXL_PF_STATE_LINK_ACTIVE_ON_DOWN.

I have connected the cable back-to-back between two servers and rebooted them both at a time. During boot, on both the nodes, immediately after the driver is attached, receives link-event and notices MEDIA_AVAILABLE + IFF_UP + UNQUALIFIED + NO_LINK_UP.

I'm testing using 4 port adapter with following config:

/boot/loader.conf:
dev.ixl.0.link_active_on_if_down=1
dev.ixl.3.link_active_on_if_down=0

/etc/rc.conf:
ifconfig_ixl0=190.2.20.1/16
ifconfig_ixl3=190.3.20.1/16

To ensure that link state is correct driver during attach sets it according to the link_active_on_if_down tunable. This triggers a link event but during attach IFF_UP flag is not set:
ixl3: ixl_set_link enable: 0
ixl3: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
ixl3: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0

Then interface is brought up with an ioctl call:
ixl3: ixl_set_link enable: 1
ixl3: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
ixl3: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl3: link state changed to UP

With link_active_on_if_down=1 FW correctly reports that module is qualified in every link event:

ixl0: ixl_set_link enable: 1
ixl0: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
ixl0: ixl_link_event IFF_UP: 0 MA: 64, LUP: 1 QUAL: 128
ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl0: link state changed to UP

ixl0: ixl_set_link enable: 1
ixl0: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 128
ixl0: link state changed to DOWN
ixl0: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl0: link state changed to UP

I have connected the cable back-to-back between two servers and rebooted them both at a time. During boot, on both the nodes, immediately after the driver is attached, receives link-event and notices MEDIA_AVAILABLE + IFF_UP + UNQUALIFIED + NO_LINK_UP.

I'm testing using 4 port adapter with following config:

/boot/loader.conf:
dev.ixl.0.link_active_on_if_down=1
dev.ixl.3.link_active_on_if_down=0

/etc/rc.conf:
ifconfig_ixl0=190.2.20.1/16
ifconfig_ixl3=190.3.20.1/16

To ensure that link state is correct driver during attach sets it according to the link_active_on_if_down tunable. This triggers a link event but during attach IFF_UP flag is not set:
ixl3: ixl_set_link enable: 0
ixl3: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
ixl3: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0

Then interface is brought up with an ioctl call:
ixl3: ixl_set_link enable: 1
ixl3: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
ixl3: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl3: link state changed to UP

With link_active_on_if_down=1 FW correctly reports that module is qualified in every link event:

ixl0: ixl_set_link enable: 1
ixl0: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
ixl0: ixl_link_event IFF_UP: 0 MA: 64, LUP: 1 QUAL: 128
ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl0: link state changed to UP

ixl0: ixl_set_link enable: 1
ixl0: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 128
ixl0: link state changed to DOWN
ixl0: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl0: link state changed to UP

Kryztof,

My run almost matches with your test result but have following differences,

On node reboot (where the link-partner is some cisco switch).

  1. During attach, when ixl_set_link(pf, 0) get invoked, like you said I had the ixl_link_event()

e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0 <<<< repeats some 50+ times
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
...
...

  1. NetApp networking stack brings up the link (this is nothing but ifhwioctl() gets invoked to bring-up the link)

e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0 <<<<<
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128

So I have a spurious link-event where I40E_AQ_LINK_UP is unset and I40E_AQ_QUALIFIED_MODULE is unset. At this stage, I get "unqualified message" on working/qualified transceiver.
Not really sure why I get a spurious link-event but not at your side. This has something to do with link-auto-negotiation and the output depends on link-partner. I think its pretty much ok for link to bounce while negotiating.
May be we need to wait for negotiation to complete before checking & printing "unqualified message" ?

On admin-link-down case (i.e ifconfig e2a down), I see,

e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0

On admin-link-up, (i.e, ifconfig e2a up), I see,

e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128

In this case, there is no spurious event.

Thank you Sai! Now I understand what I should be looking for.

I'm still not able to reproduce this issue with any of my switches. Could you, please, modify printf in the ixl_link_event function to dump hex values of status->link_info, status->an_info, hw->phy.link_info.link_info and hw->phy.link_info.an_info, and send me the log?

I'm still not able to reproduce this issue with any of my switches. Could you, please, modify printf in the ixl_link_event function to dump hex values of status->link_info, status->an_info, hw->phy.link_info.link_info and hw->phy.link_info.an_info, and send me the log?

Kryzsztof,

  1. During attach, ixl_set_link(pf, 0) gets invoked and I have

e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
..
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x0 <<<< repeats some 50+ times
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
..
..

  1. Link bring-up ioctl

e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x80 <<<<< spurious event
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0 <<< spurious event
e2a: ixl_link_event Status_LN: 0xe1, HW_LN: 0xe1, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128

Similar run on another machine has same effect. All logs are same except the "spurious" event where hw->phy.link_info.link_info is now 0xD2.

e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xD2, Status_AN: 0x0, HW_AN: 0x80 <<<
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event Status_LN: 0xe1, HW_LN: 0xe1, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128

  • Do not rely on information from link event

After delivering a link event FW disables such events until
they are re-enabled with AQC call. It is possible that
link state changes before events are re-enabled and driver
may miss that. To avoid such situation do not relay on information
from the event. Instead use most recent status info retrieved
with a Get Link Status call.

e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x80 <<<<< spurious event
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0 <<< spurious event
e2a: ixl_link_event Status_LN: 0xe1, HW_LN: 0xe1, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128

I don't think we can do much about those spurious events in the driver, but that AN status retrieved with Get Link Status AQC has correct information. Using It instead of information from the event should help.

e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x80 <<<<< spurious event
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0 <<< spurious event
e2a: ixl_link_event Status_LN: 0xe1, HW_LN: 0xe1, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128

I don't think we can do much about those spurious events in the driver, but that AN status retrieved with Get Link Status AQC has correct information. Using It instead of information from the event should help.

Thanks Kryzstof. Let's go with hw->phy.link_info.

This revision is now accepted and ready to land.Aug 20 2021, 4:09 PM