Engineering Excellence: A Technical Review of the High-Performance Servers Powering the Veltrix Initiative

Core Architecture: Beyond Off-the-Shelf Hardware

The Veltrix initiative relies on a custom server design that departs from standard commodity hardware. Each node is built around a dual-socket AMD EPYC 9654 platform, providing 192 cores per server with 384 threads. This choice was driven by the need for high memory bandwidth and PCIe lanes, critical for data-intensive workloads. The servers utilize 24-channel DDR5-4800 memory, configured with 1.5 TB per node, ensuring that large datasets remain in local RAM, reducing latency.

A key differentiator is the integration of a dedicated FPGA-based co-processor for I/O offloading. This handles network packet processing and storage virtualization, freeing CPU cycles for application logic. The entire system is interconnected via a custom InfiniBand NDR400 fabric, delivering 400 Gbps per link with sub-microsecond latency. For more details on the platform, visit https://veltrix-platform.com.

Storage Topology: NVMe over Fabrics

Storage is not local to each node. Instead, a disaggregated pool of NVMe drives is connected via NVMe over Fabrics (NVMe-oF) using the InfiniBand network. This architecture allows any compute node to access any storage device with near-local latency. Each storage shelf holds 32 U.2 NVMe drives, delivering 1.2 million random read IOPS per shelf. Redundancy is provided at the fabric level, with multiple paths between compute and storage.

Thermal and Power Management: Efficiency at Scale

Power density in these racks reaches 40 kW per rack, requiring direct-to-chip liquid cooling. Each server is equipped with cold plates directly attached to the EPYC CPUs, GPU accelerators, and the FPGA. The coolant is a dielectric fluid, circulated at 45°C, allowing for a closed-loop system with minimal heat rejection to the data center air. This reduces overall cooling energy consumption by 40% compared to traditional air-cooled systems.

Power delivery uses a 48V bus architecture, converting from AC at the rack level. This reduces distribution losses and allows for smaller gauge wiring. Each server has a dedicated power monitoring microcontroller that reports real-time wattage per component. The management system dynamically adjusts clock speeds and voltages based on workload demands, achieving a Power Usage Effectiveness (PUE) of 1.08.

Networking and Data Flow: Deterministic Low Latency

Beyond the InfiniBand fabric, the servers feature a secondary 100GbE Ethernet network for management and bulk data transfer. The primary data path for real-time analytics uses Remote Direct Memory Access (RDMA) over InfiniBand. This allows one server to read data directly from another server’s memory without CPU intervention, achieving end-to-end latency under 2 microseconds for small messages.

Network topology is a fat-tree with full bisection bandwidth. The core switches use a custom ASIC that supports adaptive routing, rerouting traffic around congestion in under 100 nanoseconds. Flow control uses Priority Flow Control (PFC) to prevent packet loss, which is critical for the high-frequency trading and simulation workloads that Veltrix processes.

FAQ:

What is the primary CPU used in Veltrix servers?

The servers use a dual-socket AMD EPYC 9654, providing 192 cores and 384 threads per node.

How is storage connected to the compute nodes?

Storage is disaggregated and connected via NVMe over Fabrics (NVMe-oF) using a custom InfiniBand NDR400 network.

What cooling method does the Veltrix initiative use?

Direct-to-chip liquid cooling with dielectric fluid, achieving a PUE of 1.08 and reducing cooling energy by 40%.

What networking technology provides the primary data path?

Remote Direct Memory Access (RDMA) over InfiniBand NDR400, delivering sub-2 microsecond latency.

How does the system manage power distribution?

It uses a 48V bus architecture at the rack level with per-component power monitoring and dynamic voltage/frequency scaling.

Reviews

Dr. Elena Voss, Lead Architect

The FPGA co-processor design eliminated our I/O bottleneck. We saw a 3x improvement in throughput on our data pipeline compared to standard EPYC systems.

Marcus Chen, Network Engineer

The adaptive routing on the InfiniBand fabric is exceptional. During peak load, latency variance remained under 100 nanoseconds. It is the most stable network I have deployed.

Sarah Jenkins, Data Scientist

Having 1.5 TB of local DDR5 per node means we rarely hit swap. The NVMe-oF storage feels as fast as local drives. It changed how we design our algorithms.

Engineering_excellence_a_technical_review_of_the_high-performance_servers_powering_the_Veltrix_initi