0001 .. SPDX-License-Identifier: GPL-2.0
0002 .. include:: <isonum.txt>
0003
0004 =============================================
0005 Chelsio N210 10Gb Ethernet Network Controller
0006 =============================================
0007
0008 Driver Release Notes for Linux
0009
0010 Version 2.1.1
0011
0012 June 20, 2005
0013
0014 .. Contents
0015
0016 INTRODUCTION
0017 FEATURES
0018 PERFORMANCE
0019 DRIVER MESSAGES
0020 KNOWN ISSUES
0021 SUPPORT
0022
0023
0024 Introduction
0025 ============
0026
0027 This document describes the Linux driver for Chelsio 10Gb Ethernet Network
0028 Controller. This driver supports the Chelsio N210 NIC and is backward
0029 compatible with the Chelsio N110 model 10Gb NICs.
0030
0031
0032 Features
0033 ========
0034
0035 Adaptive Interrupts (adaptive-rx)
0036 ---------------------------------
0037
0038 This feature provides an adaptive algorithm that adjusts the interrupt
0039 coalescing parameters, allowing the driver to dynamically adapt the latency
0040 settings to achieve the highest performance during various types of network
0041 load.
0042
0043 The interface used to control this feature is ethtool. Please see the
0044 ethtool manpage for additional usage information.
0045
0046 By default, adaptive-rx is disabled.
0047 To enable adaptive-rx::
0048
0049 ethtool -C <interface> adaptive-rx on
0050
0051 To disable adaptive-rx, use ethtool::
0052
0053 ethtool -C <interface> adaptive-rx off
0054
0055 After disabling adaptive-rx, the timer latency value will be set to 50us.
0056 You may set the timer latency after disabling adaptive-rx::
0057
0058 ethtool -C <interface> rx-usecs <microseconds>
0059
0060 An example to set the timer latency value to 100us on eth0::
0061
0062 ethtool -C eth0 rx-usecs 100
0063
0064 You may also provide a timer latency value while disabling adaptive-rx::
0065
0066 ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
0067
0068 If adaptive-rx is disabled and a timer latency value is specified, the timer
0069 will be set to the specified value until changed by the user or until
0070 adaptive-rx is enabled.
0071
0072 To view the status of the adaptive-rx and timer latency values::
0073
0074 ethtool -c <interface>
0075
0076
0077 TCP Segmentation Offloading (TSO) Support
0078 -----------------------------------------
0079
0080 This feature, also known as "large send", enables a system's protocol stack
0081 to offload portions of outbound TCP processing to a network interface card
0082 thereby reducing system CPU utilization and enhancing performance.
0083
0084 The interface used to control this feature is ethtool version 1.8 or higher.
0085 Please see the ethtool manpage for additional usage information.
0086
0087 By default, TSO is enabled.
0088 To disable TSO::
0089
0090 ethtool -K <interface> tso off
0091
0092 To enable TSO::
0093
0094 ethtool -K <interface> tso on
0095
0096 To view the status of TSO::
0097
0098 ethtool -k <interface>
0099
0100
0101 Performance
0102 ===========
0103
0104 The following information is provided as an example of how to change system
0105 parameters for "performance tuning" an what value to use. You may or may not
0106 want to change these system parameters, depending on your server/workstation
0107 application. Doing so is not warranted in any way by Chelsio Communications,
0108 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
0109 of data or damage to equipment.
0110
0111 Your distribution may have a different way of doing things, or you may prefer
0112 a different method. These commands are shown only to provide an example of
0113 what to do and are by no means definitive.
0114
0115 Making any of the following system changes will only last until you reboot
0116 your system. You may want to write a script that runs at boot-up which
0117 includes the optimal settings for your system.
0118
0119 Setting PCI Latency Timer::
0120
0121 setpci -d 1425::
0122
0123 * 0x0c.l=0x0000F800
0124
0125 Disabling TCP timestamp::
0126
0127 sysctl -w net.ipv4.tcp_timestamps=0
0128
0129 Disabling SACK::
0130
0131 sysctl -w net.ipv4.tcp_sack=0
0132
0133 Setting large number of incoming connection requests::
0134
0135 sysctl -w net.ipv4.tcp_max_syn_backlog=3000
0136
0137 Setting maximum receive socket buffer size::
0138
0139 sysctl -w net.core.rmem_max=1024000
0140
0141 Setting maximum send socket buffer size::
0142
0143 sysctl -w net.core.wmem_max=1024000
0144
0145 Set smp_affinity (on a multiprocessor system) to a single CPU::
0146
0147 echo 1 > /proc/irq/<interrupt_number>/smp_affinity
0148
0149 Setting default receive socket buffer size::
0150
0151 sysctl -w net.core.rmem_default=524287
0152
0153 Setting default send socket buffer size::
0154
0155 sysctl -w net.core.wmem_default=524287
0156
0157 Setting maximum option memory buffers::
0158
0159 sysctl -w net.core.optmem_max=524287
0160
0161 Setting maximum backlog (# of unprocessed packets before kernel drops)::
0162
0163 sysctl -w net.core.netdev_max_backlog=300000
0164
0165 Setting TCP read buffers (min/default/max)::
0166
0167 sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
0168
0169 Setting TCP write buffers (min/pressure/max)::
0170
0171 sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
0172
0173 Setting TCP buffer space (min/pressure/max)::
0174
0175 sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
0176
0177 TCP window size for single connections:
0178
0179 The receive buffer (RX_WINDOW) size must be at least as large as the
0180 Bandwidth-Delay Product of the communication link between the sender and
0181 receiver. Due to the variations of RTT, you may want to increase the buffer
0182 size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
0183 "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
0184
0185 At 10Gb speeds, use the following formula::
0186
0187 RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
0188 Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
0189
0190 RX_WINDOW sizes of 256KB - 512KB should be sufficient.
0191
0192 Setting the min, max, and default receive buffer (RX_WINDOW) size::
0193
0194 sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
0195
0196 TCP window size for multiple connections:
0197 The receive buffer (RX_WINDOW) size may be calculated the same as single
0198 connections, but should be divided by the number of connections. The
0199 smaller window prevents congestion and facilitates better pacing,
0200 especially if/when MAC level flow control does not work well or when it is
0201 not supported on the machine. Experimentation may be necessary to attain
0202 the correct value. This method is provided as a starting point for the
0203 correct receive buffer size.
0204
0205 Setting the min, max, and default receive buffer (RX_WINDOW) size is
0206 performed in the same manner as single connection.
0207
0208
0209 Driver Messages
0210 ===============
0211
0212 The following messages are the most common messages logged by syslog. These
0213 may be found in /var/log/messages.
0214
0215 Driver up::
0216
0217 Chelsio Network Driver - version 2.1.1
0218
0219 NIC detected::
0220
0221 eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
0222
0223 Link up::
0224
0225 eth#: link is up at 10 Gbps, full duplex
0226
0227 Link down::
0228
0229 eth#: link is down
0230
0231
0232 Known Issues
0233 ============
0234
0235 These issues have been identified during testing. The following information
0236 is provided as a workaround to the problem. In some cases, this problem is
0237 inherent to Linux or to a particular Linux Distribution and/or hardware
0238 platform.
0239
0240 1. Large number of TCP retransmits on a multiprocessor (SMP) system.
0241
0242 On a system with multiple CPUs, the interrupt (IRQ) for the network
0243 controller may be bound to more than one CPU. This will cause TCP
0244 retransmits if the packet data were to be split across different CPUs
0245 and re-assembled in a different order than expected.
0246
0247 To eliminate the TCP retransmits, set smp_affinity on the particular
0248 interrupt to a single CPU. You can locate the interrupt (IRQ) used on
0249 the N110/N210 by using ifconfig::
0250
0251 ifconfig <dev_name> | grep Interrupt
0252
0253 Set the smp_affinity to a single CPU::
0254
0255 echo 1 > /proc/irq/<interrupt_number>/smp_affinity
0256
0257 It is highly suggested that you do not run the irqbalance daemon on your
0258 system, as this will change any smp_affinity setting you have applied.
0259 The irqbalance daemon runs on a 10 second interval and binds interrupts
0260 to the least loaded CPU determined by the daemon. To disable this daemon::
0261
0262 chkconfig --level 2345 irqbalance off
0263
0264 By default, some Linux distributions enable the kernel feature,
0265 irqbalance, which performs the same function as the daemon. To disable
0266 this feature, add the following line to your bootloader::
0267
0268 noirqbalance
0269
0270 Example using the Grub bootloader::
0271
0272 title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
0273 root (hd0,0)
0274 kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
0275 initrd /initrd-2.4.21-27.ELsmp.img
0276
0277 2. After running insmod, the driver is loaded and the incorrect network
0278 interface is brought up without running ifup.
0279
0280 When using 2.4.x kernels, including RHEL kernels, the Linux kernel
0281 invokes a script named "hotplug". This script is primarily used to
0282 automatically bring up USB devices when they are plugged in, however,
0283 the script also attempts to automatically bring up a network interface
0284 after loading the kernel module. The hotplug script does this by scanning
0285 the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
0286 for HWADDR=<mac_address>.
0287
0288 If the hotplug script does not find the HWADDRR within any of the
0289 ifcfg-eth# files, it will bring up the device with the next available
0290 interface name. If this interface is already configured for a different
0291 network card, your new interface will have incorrect IP address and
0292 network settings.
0293
0294 To solve this issue, you can add the HWADDR=<mac_address> key to the
0295 interface config file of your network controller.
0296
0297 To disable this "hotplug" feature, you may add the driver (module name)
0298 to the "blacklist" file located in /etc/hotplug. It has been noted that
0299 this does not work for network devices because the net.agent script
0300 does not use the blacklist file. Simply remove, or rename, the net.agent
0301 script located in /etc/hotplug to disable this feature.
0302
0303 3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
0304 on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
0305
0306 If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
0307 chipset, you may experience the "133-Mhz Mode Split Completion Data
0308 Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
0309 bus PCI-X bus.
0310
0311 AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
0312 can provide stale data via split completion cycles to a PCI-X card that
0313 is operating at 133 Mhz", causing data corruption.
0314
0315 AMD's provides three workarounds for this problem, however, Chelsio
0316 recommends the first option for best performance with this bug:
0317
0318 For 133Mhz secondary bus operation, limit the transaction length and
0319 the number of outstanding transactions, via BIOS configuration
0320 programming of the PCI-X card, to the following:
0321
0322 Data Length (bytes): 1k
0323
0324 Total allowed outstanding transactions: 2
0325
0326 Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
0327 section 56, "133-MHz Mode Split Completion Data Corruption" for more
0328 details with this bug and workarounds suggested by AMD.
0329
0330 It may be possible to work outside AMD's recommended PCI-X settings, try
0331 increasing the Data Length to 2k bytes for increased performance. If you
0332 have issues with these settings, please revert to the "safe" settings
0333 and duplicate the problem before submitting a bug or asking for support.
0334
0335 .. note::
0336
0337 The default setting on most systems is 8 outstanding transactions
0338 and 2k bytes data length.
0339
0340 4. On multiprocessor systems, it has been noted that an application which
0341 is handling 10Gb networking can switch between CPUs causing degraded
0342 and/or unstable performance.
0343
0344 If running on an SMP system and taking performance measurements, it
0345 is suggested you either run the latest netperf-2.4.0+ or use a binding
0346 tool such as Tim Hockin's procstate utilities (runon)
0347 <http://www.hockin.org/~thockin/procstate/>.
0348
0349 Binding netserver and netperf (or other applications) to particular
0350 CPUs will have a significant difference in performance measurements.
0351 You may need to experiment which CPU to bind the application to in
0352 order to achieve the best performance for your system.
0353
0354 If you are developing an application designed for 10Gb networking,
0355 please keep in mind you may want to look at kernel functions
0356 sched_setaffinity & sched_getaffinity to bind your application.
0357
0358 If you are just running user-space applications such as ftp, telnet,
0359 etc., you may want to try the runon tool provided by Tim Hockin's
0360 procstate utility. You could also try binding the interface to a
0361 particular CPU: runon 0 ifup eth0
0362
0363
0364 Support
0365 =======
0366
0367 If you have problems with the software or hardware, please contact our
0368 customer support team via email at support@chelsio.com or check our website
0369 at http://www.chelsio.com
0370
0371 -------------------------------------------------------------------------------
0372
0373 ::
0374
0375 Chelsio Communications
0376 370 San Aleso Ave.
0377 Suite 100
0378 Sunnyvale, CA 94085
0379 http://www.chelsio.com
0380
0381 This program is free software; you can redistribute it and/or modify
0382 it under the terms of the GNU General Public License, version 2, as
0383 published by the Free Software Foundation.
0384
0385 You should have received a copy of the GNU General Public License along
0386 with this program; if not, write to the Free Software Foundation, Inc.,
0387 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
0388
0389 THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED
0390 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
0391 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
0392
0393 Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.