Back to home page

OSCL-LXR

 
 

    


0001 =========================
0002 NXP SJA1105 switch driver
0003 =========================
0004 
0005 Overview
0006 ========
0007 
0008 The NXP SJA1105 is a family of 10 SPI-managed automotive switches:
0009 
0010 - SJA1105E: First generation, no TTEthernet
0011 - SJA1105T: First generation, TTEthernet
0012 - SJA1105P: Second generation, no TTEthernet, no SGMII
0013 - SJA1105Q: Second generation, TTEthernet, no SGMII
0014 - SJA1105R: Second generation, no TTEthernet, SGMII
0015 - SJA1105S: Second generation, TTEthernet, SGMII
0016 - SJA1110A: Third generation, TTEthernet, SGMII, integrated 100base-T1 and
0017   100base-TX PHYs
0018 - SJA1110B: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX
0019 - SJA1110C: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX
0020 - SJA1110D: Third generation, TTEthernet, SGMII, 100base-T1
0021 
0022 Being automotive parts, their configuration interface is geared towards
0023 set-and-forget use, with minimal dynamic interaction at runtime. They
0024 require a static configuration to be composed by software and packed
0025 with CRC and table headers, and sent over SPI.
0026 
0027 The static configuration is composed of several configuration tables. Each
0028 table takes a number of entries. Some configuration tables can be (partially)
0029 reconfigured at runtime, some not. Some tables are mandatory, some not:
0030 
0031 ============================= ================== =============================
0032 Table                          Mandatory          Reconfigurable
0033 ============================= ================== =============================
0034 Schedule                       no                 no
0035 Schedule entry points          if Scheduling      no
0036 VL Lookup                      no                 no
0037 VL Policing                    if VL Lookup       no
0038 VL Forwarding                  if VL Lookup       no
0039 L2 Lookup                      no                 no
0040 L2 Policing                    yes                no
0041 VLAN Lookup                    yes                yes
0042 L2 Forwarding                  yes                partially (fully on P/Q/R/S)
0043 MAC Config                     yes                partially (fully on P/Q/R/S)
0044 Schedule Params                if Scheduling      no
0045 Schedule Entry Points Params   if Scheduling      no
0046 VL Forwarding Params           if VL Forwarding   no
0047 L2 Lookup Params               no                 partially (fully on P/Q/R/S)
0048 L2 Forwarding Params           yes                no
0049 Clock Sync Params              no                 no
0050 AVB Params                     no                 no
0051 General Params                 yes                partially
0052 Retagging                      no                 yes
0053 xMII Params                    yes                no
0054 SGMII                          no                 yes
0055 ============================= ================== =============================
0056 
0057 
0058 Also the configuration is write-only (software cannot read it back from the
0059 switch except for very few exceptions).
0060 
0061 The driver creates a static configuration at probe time, and keeps it at
0062 all times in memory, as a shadow for the hardware state. When required to
0063 change a hardware setting, the static configuration is also updated.
0064 If that changed setting can be transmitted to the switch through the dynamic
0065 reconfiguration interface, it is; otherwise the switch is reset and
0066 reprogrammed with the updated static configuration.
0067 
0068 Switching features
0069 ==================
0070 
0071 The driver supports the configuration of L2 forwarding rules in hardware for
0072 port bridging. The forwarding, broadcast and flooding domain between ports can
0073 be restricted through two methods: either at the L2 forwarding level (isolate
0074 one bridge's ports from another's) or at the VLAN port membership level
0075 (isolate ports within the same bridge). The final forwarding decision taken by
0076 the hardware is a logical AND of these two sets of rules.
0077 
0078 The hardware tags all traffic internally with a port-based VLAN (pvid), or it
0079 decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification
0080 is not possible. Once attributed a VLAN tag, frames are checked against the
0081 port's membership rules and dropped at ingress if they don't match any VLAN.
0082 This behavior is available when switch ports are enslaved to a bridge with
0083 ``vlan_filtering 1``.
0084 
0085 Normally the hardware is not configurable with respect to VLAN awareness, but
0086 by changing what TPID the switch searches 802.1Q tags for, the semantics of a
0087 bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or
0088 untagged), and therefore this mode is also supported.
0089 
0090 Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but
0091 all bridges should have the same level of VLAN awareness (either both have
0092 ``vlan_filtering`` 0, or both 1).
0093 
0094 Topology and loop detection through STP is supported.
0095 
0096 Offloads
0097 ========
0098 
0099 Time-aware scheduling
0100 ---------------------
0101 
0102 The switch supports a variation of the enhancements for scheduled traffic
0103 specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to
0104 ensure deterministic latency for priority traffic that is sent in-band with its
0105 gate-open event in the network schedule.
0106 
0107 This capability can be managed through the tc-taprio offload ('flags 2'). The
0108 difference compared to the software implementation of taprio is that the latter
0109 would only be able to shape traffic originated from the CPU, but not
0110 autonomously forwarded flows.
0111 
0112 The device has 8 traffic classes, and maps incoming frames to one of them based
0113 on the VLAN PCP bits (if no VLAN is present, the port-based default is used).
0114 As described in the previous sections, depending on the value of
0115 ``vlan_filtering``, the EtherType recognized by the switch as being VLAN can
0116 either be the typical 0x8100 or a custom value used internally by the driver
0117 for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone
0118 or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100
0119 EtherType. In these modes, injecting into a particular TX queue can only be
0120 done by the DSA net devices, which populate the PCP field of the tagging header
0121 on egress. Using ``vlan_filtering=1``, the behavior is the other way around:
0122 offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA
0123 net devices are no longer able to do that. To inject frames into a hardware TX
0124 queue with VLAN awareness active, it is necessary to create a VLAN
0125 sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged
0126 towards the switch, with the VLAN PCP bits set appropriately.
0127 
0128 Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the
0129 notable exception: the switch always treats it with a fixed priority and
0130 disregards any VLAN PCP bits even if present. The traffic class for management
0131 traffic has a value of 7 (highest priority) at the moment, which is not
0132 configurable in the driver.
0133 
0134 Below is an example of configuring a 500 us cyclic schedule on egress port
0135 ``swp5``. The traffic class gate for management traffic (7) is open for 100 us,
0136 and the gates for all other traffic classes are open for 400 us::
0137 
0138   #!/bin/bash
0139 
0140   set -e -u -o pipefail
0141 
0142   NSEC_PER_SEC="1000000000"
0143 
0144   gatemask() {
0145           local tc_list="$1"
0146           local mask=0
0147 
0148           for tc in ${tc_list}; do
0149                   mask=$((${mask} | (1 << ${tc})))
0150           done
0151 
0152           printf "%02x" ${mask}
0153   }
0154 
0155   if ! systemctl is-active --quiet ptp4l; then
0156           echo "Please start the ptp4l service"
0157           exit
0158   fi
0159 
0160   now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
0161   # Phase-align the base time to the start of the next second.
0162   sec=$(echo "${now}" | gawk -F. '{ print $1; }')
0163   base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
0164 
0165   tc qdisc add dev swp5 parent root handle 100 taprio \
0166           num_tc 8 \
0167           map 0 1 2 3 5 6 7 \
0168           queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
0169           base-time ${base_time} \
0170           sched-entry S $(gatemask 7) 100000 \
0171           sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
0172           flags 2
0173 
0174 It is possible to apply the tc-taprio offload on multiple egress ports. There
0175 are hardware restrictions related to the fact that no gate event may trigger
0176 simultaneously on two ports. The driver checks the consistency of the schedules
0177 against this restriction and errors out when appropriate. Schedule analysis is
0178 needed to avoid this, which is outside the scope of the document.
0179 
0180 Routing actions (redirect, trap, drop)
0181 --------------------------------------
0182 
0183 The switch is able to offload flow-based redirection of packets to a set of
0184 destination ports specified by the user. Internally, this is implemented by
0185 making use of Virtual Links, a TTEthernet concept.
0186 
0187 The driver supports 2 types of keys for Virtual Links:
0188 
0189 - VLAN-aware virtual links: these match on destination MAC address, VLAN ID and
0190   VLAN PCP.
0191 - VLAN-unaware virtual links: these match on destination MAC address only.
0192 
0193 The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while
0194 there are virtual link rules installed.
0195 
0196 Composing multiple actions inside the same rule is supported. When only routing
0197 actions are requested, the driver creates a "non-critical" virtual link. When
0198 the action list also contains tc-gate (more details below), the virtual link
0199 becomes "time-critical" (draws frame buffers from a reserved memory partition,
0200 etc).
0201 
0202 The 3 routing actions that are supported are "trap", "drop" and "redirect".
0203 
0204 Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the
0205 CPU and to swp3. This type of key (DA only) when the port's VLAN awareness
0206 state is off::
0207 
0208   tc qdisc add dev swp2 clsact
0209   tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \
0210           action mirred egress redirect dev swp3 \
0211           action trap
0212 
0213 Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID
0214 of 100 and a PCP of 0::
0215 
0216   tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \
0217           dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop
0218 
0219 Time-based ingress policing
0220 ---------------------------
0221 
0222 The TTEthernet hardware abilities of the switch can be constrained to act
0223 similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in
0224 IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform
0225 tight timing-based admission control for up to 1024 flows (identified by a
0226 tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which
0227 are received outside their expected reception window are dropped.
0228 
0229 This capability can be managed through the offload of the tc-gate action. As
0230 routing actions are intrinsic to virtual links in TTEthernet (which performs
0231 explicit routing of time-critical traffic and does not leave that in the hands
0232 of the FDB, flooding etc), the tc-gate action may never appear alone when
0233 asking sja1105 to offload it. One (or more) redirect or trap actions must also
0234 follow along.
0235 
0236 Example: create a tc-taprio schedule that is phase-aligned with a tc-gate
0237 schedule (the clocks must be synchronized by a 1588 application stack, which is
0238 outside the scope of this document). No packet delivered by the sender will be
0239 dropped. Note that the reception window is larger than the transmission window
0240 (and much more so, in this example) to compensate for the packet propagation
0241 delay of the link (which can be determined by the 1588 application stack).
0242 
0243 Receiver (sja1105)::
0244 
0245   tc qdisc add dev swp2 clsact
0246   now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
0247           sec=$(echo $now | awk -F. '{print $1}') && \
0248           base_time="$(((sec + 2) * 1000000000))" && \
0249           echo "base time ${base_time}"
0250   tc filter add dev swp2 ingress flower skip_sw \
0251           dst_mac 42:be:24:9b:76:20 \
0252           action gate base-time ${base_time} \
0253           sched-entry OPEN  60000 -1 -1 \
0254           sched-entry CLOSE 40000 -1 -1 \
0255           action trap
0256 
0257 Sender::
0258 
0259   now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
0260           sec=$(echo $now | awk -F. '{print $1}') && \
0261           base_time="$(((sec + 2) * 1000000000))" && \
0262           echo "base time ${base_time}"
0263   tc qdisc add dev eno0 parent root taprio \
0264           num_tc 8 \
0265           map 0 1 2 3 4 5 6 7 \
0266           queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
0267           base-time ${base_time} \
0268           sched-entry S 01  50000 \
0269           sched-entry S 00  50000 \
0270           flags 2
0271 
0272 The engine used to schedule the ingress gate operations is the same that the
0273 one used for the tc-taprio offload. Therefore, the restrictions regarding the
0274 fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at
0275 the same time (during the same 200 ns slot) still apply.
0276 
0277 To come in handy, it is possible to share time-triggered virtual links across
0278 more than 1 ingress port, via flow blocks. In this case, the restriction of
0279 firing at the same time does not apply because there is a single schedule in
0280 the system, that of the shared virtual link::
0281 
0282   tc qdisc add dev swp2 ingress_block 1 clsact
0283   tc qdisc add dev swp3 ingress_block 1 clsact
0284   tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \
0285           action gate index 2 \
0286           base-time 0 \
0287           sched-entry OPEN 50000000 -1 -1 \
0288           sched-entry CLOSE 50000000 -1 -1 \
0289           action trap
0290 
0291 Hardware statistics for each flow are also available ("pkts" counts the number
0292 of dropped frames, which is a sum of frames dropped due to timing violations,
0293 lack of destination ports and MTU enforcement checks). Byte-level counters are
0294 not available.
0295 
0296 Limitations
0297 ===========
0298 
0299 The SJA1105 switch family always performs VLAN processing. When configured as
0300 VLAN-unaware, frames carry a different VLAN tag internally, depending on
0301 whether the port is standalone or under a VLAN-unaware bridge.
0302 
0303 The virtual link keys are always fixed at {MAC DA, VLAN ID, VLAN PCP}, but the
0304 driver asks for the VLAN ID and VLAN PCP when the port is under a VLAN-aware
0305 bridge. Otherwise, it fills in the VLAN ID and PCP automatically, based on
0306 whether the port is standalone or in a VLAN-unaware bridge, and accepts only
0307 "VLAN-unaware" tc-flower keys (MAC DA).
0308 
0309 The existing tc-flower keys that are offloaded using virtual links are no
0310 longer operational after one of the following happens:
0311 
0312 - port was standalone and joins a bridge (VLAN-aware or VLAN-unaware)
0313 - port is part of a bridge whose VLAN awareness state changes
0314 - port was part of a bridge and becomes standalone
0315 - port was standalone, but another port joins a VLAN-aware bridge and this
0316   changes the global VLAN awareness state of the bridge
0317 
0318 The driver cannot veto all these operations, and it cannot update/remove the
0319 existing tc-flower filters either. So for proper operation, the tc-flower
0320 filters should be installed only after the forwarding configuration of the port
0321 has been made, and removed by user space before making any changes to it.
0322 
0323 Device Tree bindings and board design
0324 =====================================
0325 
0326 This section references ``Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml``
0327 and aims to showcase some potential switch caveats.
0328 
0329 RMII PHY role and out-of-band signaling
0330 ---------------------------------------
0331 
0332 In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by
0333 an external oscillator (but not by the PHY).
0334 But the spec is rather loose and devices go outside it in several ways.
0335 Some PHYs go against the spec and may provide an output pin where they source
0336 the 50 MHz clock themselves, in an attempt to be helpful.
0337 On the other hand, the SJA1105 is only binary configurable - when in the RMII
0338 MAC role it will also attempt to drive the clock signal. To prevent this from
0339 happening it must be put in RMII PHY role.
0340 But doing so has some unintended consequences.
0341 In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0].
0342 These are practically some extra code words (/J/ and /K/) sent prior to the
0343 preamble of each frame. The MAC does not have this out-of-band signaling
0344 mechanism defined by the RMII spec.
0345 So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the
0346 clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105
0347 emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to
0348 frame preambles, which the real PHY is not expected to understand. So the PHY
0349 simply encodes the extra symbols received from the SJA1105-as-PHY onto the
0350 100Base-Tx wire.
0351 On the other side of the wire, some link partners might discard these extra
0352 symbols, while others might choke on them and discard the entire Ethernet
0353 frames that follow along. This looks like packet loss with some link partners
0354 but not with others.
0355 The take-away is that in RMII mode, the SJA1105 must be let to drive the
0356 reference clock if connected to a PHY.
0357 
0358 RGMII fixed-link and internal delays
0359 ------------------------------------
0360 
0361 As mentioned in the bindings document, the second generation of devices has
0362 tunable delay lines as part of the MAC, which can be used to establish the
0363 correct RGMII timing budget.
0364 When powered up, these can shift the Rx and Tx clocks with a phase difference
0365 between 73.8 and 101.7 degrees.
0366 The catch is that the delay lines need to lock onto a clock signal with a
0367 stable frequency. This means that there must be at least 2 microseconds of
0368 silence between the clock at the old vs at the new frequency. Otherwise the
0369 lock is lost and the delay lines must be reset (powered down and back up).
0370 In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25
0371 MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the
0372 AN process.
0373 In the situation where the switch port is connected through an RGMII fixed-link
0374 to a link partner whose link state life cycle is outside the control of Linux
0375 (such as a different SoC), then the delay lines would remain unlocked (and
0376 inactive) until there is manual intervention (ifdown/ifup on the switch port).
0377 The take-away is that in RGMII mode, the switch's internal delays are only
0378 reliable if the link partner never changes link speeds, or if it does, it does
0379 so in a way that is coordinated with the switch port (practically, both ends of
0380 the fixed-link are under control of the same Linux system).
0381 As to why would a fixed-link interface ever change link speeds: there are
0382 Ethernet controllers out there which come out of reset in 100 Mbps mode, and
0383 their driver inevitably needs to change the speed and clock frequency if it's
0384 required to work at gigabit.
0385 
0386 MDIO bus and PHY management
0387 ---------------------------
0388 
0389 The SJA1105 does not have an MDIO bus and does not perform in-band AN either.
0390 Therefore there is no link state notification coming from the switch device.
0391 A board would need to hook up the PHYs connected to the switch to any other
0392 MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO
0393 bus). Link state management then works by the driver manually keeping in sync
0394 (over SPI commands) the MAC link speed with the settings negotiated by the PHY.
0395 
0396 By comparison, the SJA1110 supports an MDIO slave access point over which its
0397 internal 100base-T1 PHYs can be accessed from the host. This is, however, not
0398 used by the driver, instead the internal 100base-T1 and 100base-TX PHYs are
0399 accessed through SPI commands, modeled in Linux as virtual MDIO buses.
0400 
0401 The microcontroller attached to the SJA1110 port 0 also has an MDIO controller
0402 operating in master mode, however the driver does not support this either,
0403 since the microcontroller gets disabled when the Linux driver operates.
0404 Discrete PHYs connected to the switch ports should have their MDIO interface
0405 attached to an MDIO controller from the host system and not to the switch,
0406 similar to SJA1105.
0407 
0408 Port compatibility matrix
0409 -------------------------
0410 
0411 The SJA1105 port compatibility matrix is:
0412 
0413 ===== ============== ============== ==============
0414 Port   SJA1105E/T     SJA1105P/Q     SJA1105R/S
0415 ===== ============== ============== ==============
0416 0      xMII           xMII           xMII
0417 1      xMII           xMII           xMII
0418 2      xMII           xMII           xMII
0419 3      xMII           xMII           xMII
0420 4      xMII           xMII           SGMII
0421 ===== ============== ============== ==============
0422 
0423 
0424 The SJA1110 port compatibility matrix is:
0425 
0426 ===== ============== ============== ============== ==============
0427 Port   SJA1110A       SJA1110B       SJA1110C       SJA1110D
0428 ===== ============== ============== ============== ==============
0429 0      RevMII (uC)    RevMII (uC)    RevMII (uC)    RevMII (uC)
0430 1      100base-TX     100base-TX     100base-TX
0431        or SGMII                                     SGMII
0432 2      xMII           xMII           xMII           xMII
0433        or SGMII                                     or SGMII
0434 3      xMII           xMII           xMII
0435        or SGMII       or SGMII                      SGMII
0436        or 2500base-X  or 2500base-X                 or 2500base-X
0437 4      SGMII          SGMII          SGMII          SGMII
0438        or 2500base-X  or 2500base-X  or 2500base-X  or 2500base-X
0439 5      100base-T1     100base-T1     100base-T1     100base-T1
0440 6      100base-T1     100base-T1     100base-T1     100base-T1
0441 7      100base-T1     100base-T1     100base-T1     100base-T1
0442 8      100base-T1     100base-T1     n/a            n/a
0443 9      100base-T1     100base-T1     n/a            n/a
0444 10     100base-T1     n/a            n/a            n/a
0445 ===== ============== ============== ============== ==============