The server I used on all runs is an AMD K6-3/400-based machine with 128 MBytes of RAM and a DEC Tulip-based PCI network card. Running against e.g. a DEC Alpha AXP/150 with an EISA Etherlink III, this machine easily reaches data rates up to 1100..1200 KBytes/s on a 10MBps Ethernet, very close to the theoretical maximum. So when we vary the client side, we may be sure that the server is not limiting the performance.
For all tests, both the server and the clients run Linux with a 2.2 kernel. Since netio also runs on other platforms, you might want to make your own tests if Linux is not your favourite OS (bah ;-) )
When comparing Ethernet boards, the maximum data rate possible might not be the only interesting result. Except for some high-end server models, all Ethernet interface boards place the burden of transferring data between the card's buffers and memory on the CPU. Depending on the card's design, this may hog the CPU more or less and steal more or less of your CPU's time. I therefore made all measurements on three different machines:
Board | 1K Packets (KB/s) | 2K Packets (KB/s) | 4K Packets (KB/s) | 8K Packets (KB/s) | 16K Packets (KB/s) | 32K Packets (KB/s) | CPU Load (%) |
---|---|---|---|---|---|---|---|
3C523 (old) | 415.3 | 546.1 | 585.5 | 589.2 | 600.6 | 606.7 | 98.4 |
3C523 | 407.4 | 568.1 | 620.5 | 630.1 | 638.8 | 655.9 | 98.2 |
3C527 | 429.0 | 557.8 | 597.5 | 608.0 | 608.7 | 627.0 | 98.8 |
3C529 | 446.6 | 622.7 | 667.3 | 667.5 | 673.3 | 685.5 | 98.3 |
NE/2 | 365.8 | 492.4 | 534.6 | 541.2 | 542.2 | 563.6 | 98.1 |
DE-320 | 413.3 | 576.0 | 635.0 | 637.8 | 643.5 | 657.7 | 98.4 |
SMC 8013 | 414.7 | 377.1 | 581.5 | 589.9 | 586.3 | 585.1 | 98.0 |
Ethernet Adapter/A | 399.2 | 431.8 | 582.3 | 588.4 | 587.4 | 614.3 | 98.5 |
SKnet | 365.3 | 469.8 | 481.8 | 493.5 | 515.6 | 513.3 | 98.8 |
LAN Adapter/A | 424.7 | 555.2 | 601.6 | 604.0 | 615.3 | 637.5 | 98.8 |
EtherExpress | 417.4 | 542.1 | 597.5 | 599.5 | 597.8 | 627.7 | 98.5 |
DE-210 | 411.1 | 518.9 | 562.2 | 559.2 | 544.7 | 573.9 | 98.4 |
As one can see from the CPU load results, the slow CPU is the limiting factor in all cases and the values are far away from the theoretical maximum throughput. Older (read: ISA-derived) designs like the SKnet and NE/2 perform significantly worse than the others, an indication that transferring data from/to these cards is loaded with more wait states than on designs that exploit the higher speed of the MCA bus compared to ISA. The winner in this scenario is the 3C529, closely followed by the 3C523, 3C527, DE-320 and LAN/A. Though not dramatically slower, the old, 386-only 3C523 measurably lags behind the newer revision. The performance of the EtherExpress is similar to the old 3C523. One reason might be the usage of the same Ethernet controller chip, an Intel 82586, however the Intel card uses programmed I/O instead of shared memory for data transfer - an example that PIO needs not to be slower than direct access to the card's packet buffer memory. Furthermore, there is no noticable difference between the 8013 and 8003 - if there are differences between an 8- and 16-bit-card, they don't show up on this platform...
Given that the i386SX-20 was the limiting factor in the first setup, we may expect substantially better results from an i486SX-33, which is proven by the tests on the 77:
Board | 1K Packets (KB/s) | 2K Packets (KB/s) | 4K Packets (KB/s) | 8K Packets (KB/s) | 16K Packets (KB/s) | 32K Packets (KB/s) | CPU Load (%) |
---|---|---|---|---|---|---|---|
3C523 | 1040.2 | 1082.9 | 1081.2 | 1081.5 | 1081.6 | 1079.1 | 44.6 |
3C527 | 1032.0 | 1074.0 | 1073.7 | 1069.8 | 1075.5 | 1075.1 | 49.1 |
3C529 | 899.9 | 950.4 | 950.0 | 947.6 | 945.3 | 945.5 | 28.1 |
NE/2 | 722.3 | 864.0 | 885.3 | 896.9 | 904.7 | 908.8 | 99.1 |
DE-320 | 1043.2 | 1087.3 | 1083.4 | 1082.6 | 1085.6 | 1087.9 | 44.7 |
SMC 8013 | 882.0 | 1082.0 | 1079.6 | 1077.8 | 1074.3 | 1080.7 | 66.2 |
Ethernet Adapter/A | 879.5 | 1081.5 | 1083.6 | 1081.0 | 1082.2 | 1082.1 | 77.8 |
SKnet | 778.7 | 797.8 | 796.3 | 801.0 | 794.5 | 805.7 | 41.1 |
LAN Adapter/A | 1068.3 | 1088.0 | 1088.0 | 1088.0 | 1088.8 | 1086.5 | 24.5 |
EtherExpress | 957.4 | 1088.3 | 1085.1 | 1088.0 | 1088.8 | 1087.9 | 53.0 |
DE-210 | 957.4 | 923.9 | 942.7 | 945.7 | 939.0 | 943.5 | 98.9 |
Now we're starting to talk at last! The CPU load for the SKnet board shows that this board already has reached its maximum performance of about ~800 KBytes/s. In contrast, the NE/2 is capable of more, however it is still able to completely hog the CPU - not very nice in a server! The DE-320 however demonstrates that this is not an inherent property of the NE2000 design: its performance and CPU load are similar to the 3C523. Similar to the SKnet board, the 3C529 has reached its limit. With 950 KBytes/s, this is not bad, however other boards can do better. Interestingly enough, its predecessor 3C523 performs better (with a substantial CPU overhead however). The clear winner is the LAN/A: The data rate is the highest, and the CPU load is even lower than for the slower 3C529. The EtherExpress also performs well, however with a measurably higher CPU load than the 3C523 or LAN/A, plus a lower performance for small frames.
WD8003 and 8013 do not differ significantly in their performance, the 8003's CPU load is however higher - a sign that the cards have reachrd their maximum internal transfer rate, but it takes the CPU more to stuff the dat through the 8003's slower bus interface.
What does this somehow surprising result tell us? Everybody's darling 3C529 is not the fastest card (at least for Linux), cards that use shared memory or a good PIO interface for transfer are the faster choice. The 3C529 however is still not a bad choice, the difference is mainly for people who want to drive it to the peak.
Another result is that the busmastering 3C527 produces significantly more load than other, non-busmastering boards! One explanation could be that controlling this board is more complex (not probable). Another reason might be that the driver does not exploit the board's bus mastering capabilities, i.e. the received frames are written to buffers kept by the driver, and the driver then copies the data into the operating system's buffers. This might sound awkward (and is is in fact, since it voids the advantages of busmaster operation), but it is sometimes unavoidable, either due to the way the kernel interface works or buffer alignment constraints... this shouldn't cover the fact that the 3C527 delivers good performance and deserves the designation 'High Performance Adapter'.
So let's see if a P90 can crank even more out of the cards:
Board | 1K Packets (KB/s) | 2K Packets (KB/s) | 4K Packets (KB/s) | 8K Packets (KB/s) | 16K Packets (KB/s) | 32K Packets (KB/s) | CPU Load (%) |
---|---|---|---|---|---|---|---|
3C523 | 1000.9 | 1026.7 | 1014.6 | 1013.1 | 1015.3 | 1016.1 | 5.1 |
3C527 | 1030.6 | 1074.5 | 1074.4 | 1072.4 | 1076.7 | 1077.7 | 8.8 |
3C529 | 929.0 | 959.7 | 959.3 | 956.8 | 959.1 | 959.0 | 5.1 |
NE/2 | 1046.5 | 1106.5 | 1105.9 | 1098.3 | 1100.1 | 1102.3 | 9.3 |
DE-320 | 1061.7 | 1109.3 | 1109.3 | 1106.0 | 1106.3 | 1108.2 | 6.6 |
SMC 8013 | 1048.9 | 1093.7 | 1093.4 | 1087.4 | 1089.5 | 1086.7 | 6.0 |
Ethernet Adapter/A | 1040.7 | 1107.5 | 1107.6 | 1104.6 | 1105.3 | 1105.5 | 5.6 |
SKnet | 793.6 | 804.2 | 808.9 | 803.5 | 818.9 | 799.9 | 5.2 |
LAN Adapter/A | 1099.9 | 1106.7 | 1107.7 | 1105.1 | 1107.6 | 1108.5 | 4.0 |
EtherExpress | 1070.3 | 1081.8 | 1081.1 | 1080.9 | 1081.8 | 1082.2 | 4.7 |
DE-210 | 887.8 | 950.2 | 926.1 | 912.8 | 917.7 | 921.1 | 29.4 |
Only the NE/2 shows a significant gain and now exhibits performances comparable to the LAN/A, yet with a substantially higher CPU load, while the 3C523 finally has reached the ceiling. The same is true for the EtherExpress, but its limit is a few Kbytes/second higher. The 3C529's limit around 950 KByte/s is underlined by these values. Once again, the LAN/A shows the best performance paired with the lowest CPU overhead. Except for the NE/2 (and the DE210!), overhead is not an issue for any card. Frustatingly, the 3C527 that should have the lowest CPU load delivers one of the worst results - busmastering alone is no guarantee for a low CPU load, everything else (including the software implementation) has to cooperate! It remains to be seen if IBM's EtherStreamer MC/32, another busmaster Ethernet adapter, could do better. I also have such an MC/32, however there is no Linux driver for it and IBM isn't very cooperative in releasing programming information :-(
Packet Size (KByte) | Rate (KB/s) |
---|---|
1 | 2665.2 |
2 | 3013.8 |
4 | 3032.7 |
8 | 2960.9 |
16 | 2879.5 |
32 | 2937.4 |
Place 2 goes to the 3COM boards: While the 3C523 offers a slightly better performance, the 3C529 attracts with onboard 10baseT and a smaller form factor (handy e.g. in the P70 portable PS/2!). I wouldn't overrate the 3C529's lower performance too much: The rate is still more than you will ever get on a loaded 10 Mbps wire.
It's difficult to say whether the SKnet, DE210 or the NE/2 is the worst board: Only 800 KBytes/s is already a notable loss, but at least it doesn't put that much load on the CPU as the NE/2 or DE210. The NE/2 is like a lemon: if you squeeze it just enough, you will get what you want ;-) I wouldn't advise to use either in a server...
The 3C529 and even more the DE-320/EtherExpress/8013 are good cards where the length of the card is an issue, like in the P70/75 portables: They fit into the 'half length' slots and still offer good performance.