A fast and low-latency network is an often underestimated but essential component for ML infrastructures and also brings advantages for other applications.
22.03.2024,
07:00
Clock
Reading time: 23 mins.
Cloud, high-performance computing and machine learning are getting closer and closer. This trend has been apparent since 2014: On the one hand, researchers are discovering container technologies and concepts such as redundancy. On the other hand, managed service and cloud providers in particular are finding that it is financially worthwhile to use high-performance and high-performance hardware, which may be more expensive to purchase but provides disproportionately more resources. It is not uncommon for five racks to become two or less.
From an infrastructure perspective, the most important, but regularly underestimated, component is a high-throughput, low-latency network. Because – to the surprise of many IT departments – a lot has happened since 2004, not only in terms of throughput and latencies, but also in the architecture of modern fabrics with spine-leaf topology.
But first about the performance. Modern Ethernet networks will have a throughput of up to 400 Gbit/s in 2024 – in each direction, of course. There are even products with 800 Gbit/s on the market – but unless you're sitting on a pot of gold, the prices make sobering reading. In addition, 800GE (800 Gigabit Ethernet) is not yet fully specified by the Ethernet Alliance (PDF). But admittedly: If you have the necessary change for one or probably even several 800GE switches, this shortcoming should not be of much interest to you. Speaking of change: The street price of a single DR single-mode transceiver (Data Center Reach) for 800GE is between four and eight thousand euros net, depending on the manufacturer. Nota bene: However, the price per GBit/s gradually decreases.