High Performance Computing Grows Up
By David Rukshin, CTO, WorldQuant, LLC
As HPC technologies mature and become more commonplace, we observe a general trend of commoditization to the point that it is now possible, and effective, to build high performance compute infrastructures without needing esoteric network fabrics and interconnects. While the InfiniBand interconnect still dominates the most powerful HPC clusters in Top 500 lists, its use overall within HPC has been declining in recent years. In contrast, Ethernet has become more common in HPC installations as speeds have increased and costs have come down. Moreover, as InfiniBand has matured over the years— becoming easier to manage and much better understood— it’s easy to argue that it no longer should be considered an exotic interconnect.
This trend allows for a decreasing reliance on highly specialized staff. Instead, it calls for well-versed technology multidisciplinarians who can help tune HPC applications and infrastructure to work effectively together. As HPC technologies become less specialized, it is becoming possible to use HPC infrastructure for both specialized, distributed “supercomputing” applications and traditional applications adapted for the cloud infrastructure model. At the same time, HPC capabilities are emerging among public cloud providers, leading to a further blurring of the distinction between HPC and cloud architectures. At WorldQuant we are building application architectures that leverage both HPC and cloud infrastructure almost interchangeably.
The emergence of this blending of HPC and cloud approaches requires applications to be developed to leverage modern data management techniques.
HPC capabilities are emerging among public cloud providers, leading to a further blurring of the distinction between HPC and cloud architectures
Distributed data stores facilitate seamless transitions between traditional enterprises and even traditional HPC architectures to emerging HPC and cloud architectures centered around commodity network and storage technologies. Although the general trend is for HPC technologies to become less specialized, it will be interesting to see how the adoption of Graphics Processing Units (GPUs) and other accelerators influence that trajectory.
We observe a convergence of HPC technologies that have become commoditized and commodity infrastructure that has adopted technologies that have long been the mainstay of HPC. Examples of the former include increasing use of GPFS and Lustre file systems beyond specialized HPC deployments, as well as a maturing of pNFS technology that can work effectively in HPC and non-HPC deployments. Conversely, Ethernet performance has been gaining on InfiniBand, with bandwith growing to 100 Gbps and a road map extending out to 400 Gbps. Indeed, late 2016 saw the entry of a supercomputer outfitted with 100 Gbps Ethernet in the Top 500 ranking for the first time. Ethernet’s rise does not stop at throughput, however, with the availability of RDMA over Converged Ethernet (RoCE) capabilities fueling superfast storage fabrics such as NVMe over Fabrics.
The implementation of data management techniques to effectively leverage converged HPC and cloud approaches is particularly important. Many financial use cases do not lend themselves to the MapReduce type of data solutions prevalent in the Big data space. Time-series-based applications, such as those used in the backtesting of financial models, are often difficult to partition or reduce due to the inherent data node path dependence of time-series data and financial models. Such workloads are often worst-case for localizing data as applications can walk the entirety of the data set as models traverse the time continuum. The ability to effectively leverage distributed data stores that lend themselves to workloads that span a very large number of servers can become a source of competitive advantage for financial firms.
A more commoditized HPC infrastructure is also making it possible to move away from the “big bang” three-or four-year upgrade cycle in which the entire hardware plant—servers, network and storage fabric—is upgraded. In its place comes a regime that allows an almost constant upgrade of components of the HPC infrastructure, leading to a much improved and efficient matching of HPC compute demand to supply. For example, it becomes possible to upgrade significant portions of the environment as new processor technologies get released as part of vendor “tick” and “tock” cycles.
As cloud providers enhance their HPC offerings, fully cloud-native solutions are becoming viable. Bare metal cloud, RDMA, GPUs and other accelerators are now available for rent by the hour. For organizations that already are leveraging cloud capabilities for other IT use cases, this approach can be even more compelling. It is important to note, however, that cloud pricing is often tailored toward workloads that are elastic in nature; users need to be thoughtful about the way in which each HPC workload maps onto the cloud pricing model.
Enterprises are now able to leverage technologies that were once unseen outside of the rarified air of the laboratory. In parallel, advancements to commodity technologies bring them closer in performance to their more exotic kin. There is still some way to go before the big data stack is a natural fit for the time-series workloads that are the staple of the financial services industry, but it is now possible to build systems architectures that blend components of High Performance Computing, big data and public cloud.