Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 .. include:: <isonum.txt>
0003 
0004 =============================================
0005 Chelsio N210 10Gb Ethernet Network Controller
0006 =============================================
0007 
0008 Driver Release Notes for Linux
0009 
0010 Version 2.1.1
0011 
0012 June 20, 2005
0013 
0014 .. Contents
0015 
0016  INTRODUCTION
0017  FEATURES
0018  PERFORMANCE
0019  DRIVER MESSAGES
0020  KNOWN ISSUES
0021  SUPPORT
0022 
0023 
0024 Introduction
0025 ============
0026 
0027  This document describes the Linux driver for Chelsio 10Gb Ethernet Network
0028  Controller. This driver supports the Chelsio N210 NIC and is backward
0029  compatible with the Chelsio N110 model 10Gb NICs.
0030 
0031 
0032 Features
0033 ========
0034 
0035 Adaptive Interrupts (adaptive-rx)
0036 ---------------------------------
0037 
0038   This feature provides an adaptive algorithm that adjusts the interrupt
0039   coalescing parameters, allowing the driver to dynamically adapt the latency
0040   settings to achieve the highest performance during various types of network
0041   load.
0042 
0043   The interface used to control this feature is ethtool. Please see the
0044   ethtool manpage for additional usage information.
0045 
0046   By default, adaptive-rx is disabled.
0047   To enable adaptive-rx::
0048 
0049       ethtool -C <interface> adaptive-rx on
0050 
0051   To disable adaptive-rx, use ethtool::
0052 
0053       ethtool -C <interface> adaptive-rx off
0054 
0055   After disabling adaptive-rx, the timer latency value will be set to 50us.
0056   You may set the timer latency after disabling adaptive-rx::
0057 
0058       ethtool -C <interface> rx-usecs <microseconds>
0059 
0060   An example to set the timer latency value to 100us on eth0::
0061 
0062       ethtool -C eth0 rx-usecs 100
0063 
0064   You may also provide a timer latency value while disabling adaptive-rx::
0065 
0066       ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
0067 
0068   If adaptive-rx is disabled and a timer latency value is specified, the timer
0069   will be set to the specified value until changed by the user or until
0070   adaptive-rx is enabled.
0071 
0072   To view the status of the adaptive-rx and timer latency values::
0073 
0074       ethtool -c <interface>
0075 
0076 
0077 TCP Segmentation Offloading (TSO) Support
0078 -----------------------------------------
0079 
0080   This feature, also known as "large send", enables a system's protocol stack
0081   to offload portions of outbound TCP processing to a network interface card
0082   thereby reducing system CPU utilization and enhancing performance.
0083 
0084   The interface used to control this feature is ethtool version 1.8 or higher.
0085   Please see the ethtool manpage for additional usage information.
0086 
0087   By default, TSO is enabled.
0088   To disable TSO::
0089 
0090       ethtool -K <interface> tso off
0091 
0092   To enable TSO::
0093 
0094       ethtool -K <interface> tso on
0095 
0096   To view the status of TSO::
0097 
0098       ethtool -k <interface>
0099 
0100 
0101 Performance
0102 ===========
0103 
0104  The following information is provided as an example of how to change system
0105  parameters for "performance tuning" an what value to use. You may or may not
0106  want to change these system parameters, depending on your server/workstation
0107  application. Doing so is not warranted in any way by Chelsio Communications,
0108  and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
0109  of data or damage to equipment.
0110 
0111  Your distribution may have a different way of doing things, or you may prefer
0112  a different method. These commands are shown only to provide an example of
0113  what to do and are by no means definitive.
0114 
0115  Making any of the following system changes will only last until you reboot
0116  your system. You may want to write a script that runs at boot-up which
0117  includes the optimal settings for your system.
0118 
0119   Setting PCI Latency Timer::
0120 
0121       setpci -d 1425::
0122 
0123 * 0x0c.l=0x0000F800
0124 
0125   Disabling TCP timestamp::
0126 
0127       sysctl -w net.ipv4.tcp_timestamps=0
0128 
0129   Disabling SACK::
0130 
0131       sysctl -w net.ipv4.tcp_sack=0
0132 
0133   Setting large number of incoming connection requests::
0134 
0135       sysctl -w net.ipv4.tcp_max_syn_backlog=3000
0136 
0137   Setting maximum receive socket buffer size::
0138 
0139       sysctl -w net.core.rmem_max=1024000
0140 
0141   Setting maximum send socket buffer size::
0142 
0143       sysctl -w net.core.wmem_max=1024000
0144 
0145   Set smp_affinity (on a multiprocessor system) to a single CPU::
0146 
0147       echo 1 > /proc/irq/<interrupt_number>/smp_affinity
0148 
0149   Setting default receive socket buffer size::
0150 
0151       sysctl -w net.core.rmem_default=524287
0152 
0153   Setting default send socket buffer size::
0154 
0155       sysctl -w net.core.wmem_default=524287
0156 
0157   Setting maximum option memory buffers::
0158 
0159       sysctl -w net.core.optmem_max=524287
0160 
0161   Setting maximum backlog (# of unprocessed packets before kernel drops)::
0162 
0163       sysctl -w net.core.netdev_max_backlog=300000
0164 
0165   Setting TCP read buffers (min/default/max)::
0166 
0167       sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
0168 
0169   Setting TCP write buffers (min/pressure/max)::
0170 
0171       sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
0172 
0173   Setting TCP buffer space (min/pressure/max)::
0174 
0175       sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
0176 
0177   TCP window size for single connections:
0178 
0179    The receive buffer (RX_WINDOW) size must be at least as large as the
0180    Bandwidth-Delay Product of the communication link between the sender and
0181    receiver. Due to the variations of RTT, you may want to increase the buffer
0182    size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
0183    "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
0184 
0185    At 10Gb speeds, use the following formula::
0186 
0187        RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
0188        Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
0189 
0190    RX_WINDOW sizes of 256KB - 512KB should be sufficient.
0191 
0192    Setting the min, max, and default receive buffer (RX_WINDOW) size::
0193 
0194        sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
0195 
0196   TCP window size for multiple connections:
0197    The receive buffer (RX_WINDOW) size may be calculated the same as single
0198    connections, but should be divided by the number of connections. The
0199    smaller window prevents congestion and facilitates better pacing,
0200    especially if/when MAC level flow control does not work well or when it is
0201    not supported on the machine. Experimentation may be necessary to attain
0202    the correct value. This method is provided as a starting point for the
0203    correct receive buffer size.
0204 
0205    Setting the min, max, and default receive buffer (RX_WINDOW) size is
0206    performed in the same manner as single connection.
0207 
0208 
0209 Driver Messages
0210 ===============
0211 
0212  The following messages are the most common messages logged by syslog. These
0213  may be found in /var/log/messages.
0214 
0215   Driver up::
0216 
0217      Chelsio Network Driver - version 2.1.1
0218 
0219   NIC detected::
0220 
0221      eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
0222 
0223   Link up::
0224 
0225      eth#: link is up at 10 Gbps, full duplex
0226 
0227   Link down::
0228 
0229      eth#: link is down
0230 
0231 
0232 Known Issues
0233 ============
0234 
0235  These issues have been identified during testing. The following information
0236  is provided as a workaround to the problem. In some cases, this problem is
0237  inherent to Linux or to a particular Linux Distribution and/or hardware
0238  platform.
0239 
0240   1. Large number of TCP retransmits on a multiprocessor (SMP) system.
0241 
0242       On a system with multiple CPUs, the interrupt (IRQ) for the network
0243       controller may be bound to more than one CPU. This will cause TCP
0244       retransmits if the packet data were to be split across different CPUs
0245       and re-assembled in a different order than expected.
0246 
0247       To eliminate the TCP retransmits, set smp_affinity on the particular
0248       interrupt to a single CPU. You can locate the interrupt (IRQ) used on
0249       the N110/N210 by using ifconfig::
0250 
0251           ifconfig <dev_name> | grep Interrupt
0252 
0253       Set the smp_affinity to a single CPU::
0254 
0255           echo 1 > /proc/irq/<interrupt_number>/smp_affinity
0256 
0257       It is highly suggested that you do not run the irqbalance daemon on your
0258       system, as this will change any smp_affinity setting you have applied.
0259       The irqbalance daemon runs on a 10 second interval and binds interrupts
0260       to the least loaded CPU determined by the daemon. To disable this daemon::
0261 
0262           chkconfig --level 2345 irqbalance off
0263 
0264       By default, some Linux distributions enable the kernel feature,
0265       irqbalance, which performs the same function as the daemon. To disable
0266       this feature, add the following line to your bootloader::
0267 
0268           noirqbalance
0269 
0270           Example using the Grub bootloader::
0271 
0272               title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
0273               root (hd0,0)
0274               kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
0275               initrd /initrd-2.4.21-27.ELsmp.img
0276 
0277   2. After running insmod, the driver is loaded and the incorrect network
0278      interface is brought up without running ifup.
0279 
0280       When using 2.4.x kernels, including RHEL kernels, the Linux kernel
0281       invokes a script named "hotplug". This script is primarily used to
0282       automatically bring up USB devices when they are plugged in, however,
0283       the script also attempts to automatically bring up a network interface
0284       after loading the kernel module. The hotplug script does this by scanning
0285       the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
0286       for HWADDR=<mac_address>.
0287 
0288       If the hotplug script does not find the HWADDRR within any of the
0289       ifcfg-eth# files, it will bring up the device with the next available
0290       interface name. If this interface is already configured for a different
0291       network card, your new interface will have incorrect IP address and
0292       network settings.
0293 
0294       To solve this issue, you can add the HWADDR=<mac_address> key to the
0295       interface config file of your network controller.
0296 
0297       To disable this "hotplug" feature, you may add the driver (module name)
0298       to the "blacklist" file located in /etc/hotplug. It has been noted that
0299       this does not work for network devices because the net.agent script
0300       does not use the blacklist file. Simply remove, or rename, the net.agent
0301       script located in /etc/hotplug to disable this feature.
0302 
0303   3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
0304      on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
0305 
0306       If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
0307       chipset, you may experience the "133-Mhz Mode Split Completion Data
0308       Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
0309       bus PCI-X bus.
0310 
0311       AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
0312       can provide stale data via split completion cycles to a PCI-X card that
0313       is operating at 133 Mhz", causing data corruption.
0314 
0315       AMD's provides three workarounds for this problem, however, Chelsio
0316       recommends the first option for best performance with this bug:
0317 
0318         For 133Mhz secondary bus operation, limit the transaction length and
0319         the number of outstanding transactions, via BIOS configuration
0320         programming of the PCI-X card, to the following:
0321 
0322            Data Length (bytes): 1k
0323 
0324            Total allowed outstanding transactions: 2
0325 
0326       Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
0327       section 56, "133-MHz Mode Split Completion Data Corruption" for more
0328       details with this bug and workarounds suggested by AMD.
0329 
0330       It may be possible to work outside AMD's recommended PCI-X settings, try
0331       increasing the Data Length to 2k bytes for increased performance. If you
0332       have issues with these settings, please revert to the "safe" settings
0333       and duplicate the problem before submitting a bug or asking for support.
0334 
0335       .. note::
0336 
0337             The default setting on most systems is 8 outstanding transactions
0338             and 2k bytes data length.
0339 
0340   4. On multiprocessor systems, it has been noted that an application which
0341      is handling 10Gb networking can switch between CPUs causing degraded
0342      and/or unstable performance.
0343 
0344       If running on an SMP system and taking performance measurements, it
0345       is suggested you either run the latest netperf-2.4.0+ or use a binding
0346       tool such as Tim Hockin's procstate utilities (runon)
0347       <http://www.hockin.org/~thockin/procstate/>.
0348 
0349       Binding netserver and netperf (or other applications) to particular
0350       CPUs will have a significant difference in performance measurements.
0351       You may need to experiment which CPU to bind the application to in
0352       order to achieve the best performance for your system.
0353 
0354       If you are developing an application designed for 10Gb networking,
0355       please keep in mind you may want to look at kernel functions
0356       sched_setaffinity & sched_getaffinity to bind your application.
0357 
0358       If you are just running user-space applications such as ftp, telnet,
0359       etc., you may want to try the runon tool provided by Tim Hockin's
0360       procstate utility. You could also try binding the interface to a
0361       particular CPU: runon 0 ifup eth0
0362 
0363 
0364 Support
0365 =======
0366 
0367  If you have problems with the software or hardware, please contact our
0368  customer support team via email at support@chelsio.com or check our website
0369  at http://www.chelsio.com
0370 
0371 -------------------------------------------------------------------------------
0372 
0373 ::
0374 
0375  Chelsio Communications
0376  370 San Aleso Ave.
0377  Suite 100
0378  Sunnyvale, CA 94085
0379  http://www.chelsio.com
0380 
0381 This program is free software; you can redistribute it and/or modify
0382 it under the terms of the GNU General Public License, version 2, as
0383 published by the Free Software Foundation.
0384 
0385 You should have received a copy of the GNU General Public License along
0386 with this program; if not, write to the Free Software Foundation, Inc.,
0387 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
0388 
0389 THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED
0390 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
0391 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
0392 
0393 Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.