By Scot Schultz
High Performance Computing (HPC) in the cloud has the potential to improve a wide variety of industries, but this trend is not impressive simply based on its potential—there are already real world cloud applications leveraging HPC today that have led to incredible accomplishments. HPC in the cloud has enabled research into the effects of Climate Change, helped run scenarios to find cures for rare diseases and aided global energy companies in maximizing the power output of tens of thousands of wind turbines around the world for affordable, renewable energy. HPC has many different advantages depending on the specific use case, but one aspect that these implementations have in common is their use of RDMA-based fabrics to improve compute performance and reduce latency.
The low latency and high performance benefits that Remote Direct Memory Access (RDMA) fabrics, such as InfiniBand and RDMA over Converged Ethernet (RoCE), offer are crucial to the development of HPC applications. RDMA’s ultra-low latency allows minimal processing overhead and exponentially increases overall system performance, which applications such as Artificial Intelligence (AI) and Machine Learning demand while transferring significant amounts of data, exchanging messages and computing results.
Government agencies, research institutions and universities used to be part of the select few that leveraged HPC for scientific breakthroughs and academic studies. However, over the past few years, private enterprises such as cloud service providers, such as Microsoft Azure, IBM Cloud, Profit Bricks and others saw the many benefits that HPC can bring to their business and have begun making investments to take advantage of everything that High Performance Computing has to offer.
A growing number of organizations are also using cloud-based HPC applications as it allows them to be more efficient and conduct innovative studies that lead to new breakthroughs in their respective fields. By moving HPC workloads to the cloud, data can be offloaded from a local server and delegated to remote servers for storage and computing. This reallocation enables helpful features like serverless computing which provides a level of flexibility and affordability that just can’t be beat by in-house servers. Serverless computing expedites application deployment by allowing the user to immediately concentrate on developing the code rather than focusing on developing the infrastructure. Swift application implementation also enables faster innovation and tangible results, making HPC in the cloud not only quick, but efficient as well.
For example, Australia’s National Computational Infrastructure (NCI) offers cloud-based HPC resources to a number of research efforts, including one project that aims to restore the continent’s Great Barrier Reef to its former glory. In 2016, NCI leveraged 100Gb/s EDR InfiniBand for its newest Lenovo NextScale supercomputer, adding a 40% performance increase in NCI computational capacity. Due to this increase in performance and capacity, NCI was better suited to help researchers learn important facts about the bleaching and destruction of the Great Barrier Reef in early 2017. By identifying this bleaching trend so quickly, researchers were able to map out how these events impact Australia and the world at large, and to pursue further understanding and possible solutions for this pressing issue.
InfiniBand can also be leveraged for research conducted to find a cure for Rett Syndrome, a rare postnatal neurologic disorder that occurs almost exclusively in females and leads to lifelong disabilities. The current research on Rett Syndrome has proven that the disease can be cured, but this research requires high performance computing and cloud-based solutions to turn the possibility of a cure into a reality. HPC and cloud-based solutions enable researchers around the globe to run millions of scenarios, weed out nonviable solutions and focus on finding discoveries that can end the unnecessary suffering caused by this disease. InfiniBand provides the necessary bandwidth that has the potential to allow researchers from all over the world to share their findings and collaborate with others in their field. With numerous clinical trials underway, a full-on cure is within reach.
Wind energy is another field that cloud-based RDMA is optimizing. Vestas is a global energy company that concentrates on the development of industrial scale wind energy projects. The company designed an in-house, InfiniBand-based cloud that performs global weather simulations to proactively optimize turbine operational parameters and accelerate the predictive analysis of energy demand. These real-time simulations help Vestas maximize the return from tens of thousands of wind turbines across the world and reduces the cost of energy for their customers while creating clean energy for the planet.
So why does HPC in the cloud matter? Moving HPC to the cloud allows businesses to affordably expand their system capabilities and infrastructure, gives researchers located around the world the opportunity to collaborate on medical breakthroughs and enables real-time simulations that have the power to better our world. However, none of these breakthroughs would be possible without the low latency and high performance capabilities that RDMA provides for HPC applications. If HPC in the cloud is a door to a brighter future, then RDMA-based fabrics are the key to unlocking the potential behind that door.