Page MenuHomeFreeBSD

Allow em(4) to particpate in auto-negotiation for fixed 100b or 10b configuration
ClosedPublic

Authored by fbsd_opal.com on Mar 4 2022, 11:42 PM.
Tags
None
Referenced Files
F102863706: D34449.diff
Mon, Nov 18, 3:51 AM
Unknown Object (File)
Wed, Nov 13, 5:39 AM
Unknown Object (File)
Sep 30 2024, 10:41 AM
Unknown Object (File)
Sep 27 2024, 12:42 AM
Unknown Object (File)
Sep 26 2024, 2:49 PM
Unknown Object (File)
Sep 23 2024, 3:56 AM
Unknown Object (File)
Sep 19 2024, 5:52 AM
Unknown Object (File)
Sep 18 2024, 9:39 AM

Details

Summary

Currently if an em(4) interface is set to a fixed media configuration, for 1000b, it will participate in auto-negotiation as required by IEEE 802.3-2018 Clause 37.
However, if set to fixed media configuration for 100b or 10b, it does NOT participate in auto-negotiation.

By my reading of Clauses 28 and 37, while auto-negotiation is optional for 100b and 10b, it is not prohibited and is, in fact, "highly recommended".

This patch enables auto-negotiation for fixed 100b and 10b media configuration, in a similar manner to that already performed for 1000b. I.e., the patch enables advertising of just the manually configured settings with the goal of allowing the remote end to match the manually configured settings if it has them available.

To be clear, this patch does NOT allow an em(4) interface that has been manually configured with specific media settings to respond to auto-negotiation by then configuring different parameters to those that were manually configured. The intent of this patch is to fully comply with the requirements of Clause 37, but for 100b and 10b.

The need for this has arisen on an em(4) link where the other end is under a different administrative control and is set to full auto-negotiation. Due to the cable length GigE is not working well. It is desired to set the em(4) end to "media 100baseTX mediatype full-duplex" which does work when both ends are configured that way. Currently, because em(4) does not participate in autoneg for this setting, the remote defaults to half-duplex - i.e., there's a duplex mismatch and things don't work. With this patch, em(4) would inform the remote that it has only 100baseTX full, the remote would match that and it will work.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

fbsd_opal.com created this revision.
erj edited reviewers, added: kbowling; removed: jfv.

Is anyone else going to look at this? This does seem alright to me, though I don't know how to answer the autoneg_wait_to_complete question.

This revision is now accepted and ready to land.Apr 12 2022, 8:18 PM

Thanks for your feedback, Eric.

I added the question about whether or not we need to set autoneg_wait_to_complete to FALSE in order to consider the case where em(4) would now attempt the autonegotiation dialog but the other end does not have autonegotiation capability, e.g., it is some old 10bt device. I was worried that waiting might cause things to block indefinitely.

On reading more of the code, especially e1000_phy.c e1000_copper_link_autoneg() and e1000_wait_autoneg(), I see the wait routines do include a timeout (currently 45 loop iterations with a 100ms delay). So waiting would not block indefinitely, but could cause a delay of up to 4.5s.

However, because the em(4) end has been manually configured to a specific setting, this means it will not be changing its configuration even if autonegotiation does take place. So, waiting would appear to be unnecessary.

I think hw->phy.autoneg_wait_to_complete is also set to false in the driver earlier in em_if_attach_pre() and never changes.

From what we can tell if the other end prohibits auto-negotiation forcing a particular media setting the NIC ends up in "no carrier" status with this patch, see https://www.reddit.com/r/opnsense/comments/xw4oiz/comment/ir4mxb0/?utm_source=reddit&utm_medium=web2x&context=3 and https://forum.opnsense.org/index.php?topic=30274

That would suggest that the lower-level code that is supposed to detect when the link partner does not auto-negotiate and then force the link state up, is not working in your case.

The functions e1000_check_for_serdes_link_generic() and e1000_check_for_serdes_link_82571() are where forcing the link up is done.

There is a lot of debugging code in the e1000 driver. It appears that, to enable it, you have to edit e1000_osdep.h and set DBG to 1 (about line 107).

I don't have any intention to debug this as I don't have a setup at hand that causes this. It should, however, be considered to revert the commit before 13.2 or 14.0 is released with it for the sole purpose of fixing a theoretical issue vs. breaking existing setups.

I can see where the "shared code" (e1000 api) is not ready in general and for lem(4). For the submitter, if you want to try and chase that down e1000_setup_copper_link_generic() in e1000_phy.c.. maybe try modifying the conditional. I'm going to revert this.

Disappointing that this has been reverted.
The problem that this fixes is now broken again.
It is unlikely that I will be able to debug Franco's problem as I do not have that problem myself. I am also away from my lab for the next several months and do not have equipment with me that I can use to test this situation.
It would be most helpful, Franco, if you could try to debug the code that is supposed to handle the no-autoneg link partner situation.

The problem is what you submitted is logically inconsistent in the e1000 code. Other consumers of this code (i.e. Linux, DPDK) have not plugged the holes either (for the record they do not implement this improvement). I've provided a starting pointer in the phy code. You will find at least one more issue in the lem(4) phy code. This is via ripgrep and code inspection, you don't need any hardware to see it.