Will Exascale Computing Change Everything? Top Experts Weigh In
The difference between an exascale computer and a laptop is the difference between a professional wrestler and a chipmunk: a lot of power. Forthcoming exascale computers will be roughly a million times more powerful than the computers civilians use — more powerful, in fact, than any computer in existence.
The world’s fastest supercomputer, Summit, lives at Oak Ridge National Laboratory in Tennessee and looks like a cross between a server farm and a superhero; deep blue lightning bolt iconography adorns its glowing, floor-to-ceiling rows of hardware.
Summit can do 200 million billion calculations per second. An exascale computer, though, will be at least five times faster, capable of performing a quintillion — a billion billion — calculations per second.
“[O]f course there’s nothing magical about exascale,” writes Steve Scott, CTO of high-performance computing company Cray. “It’s just one point along a continuum [of computing power].”
But top-tier computing power is key to all kinds of research in industry and academia, as Scott and many other experts in the field note. Among other benefits, it facilitates better and quicker simulations of everything from climate events and drug discovery to cancer treatments and the birth of the universe. It can also bolster national security.
The American exascale journey is slated to start in 2021 outside Chicago at Argonne National Laboratories, which is preparing for the arrival of exascale behemoth “Aurora.” It’s one of three government-funded exascale computing projects now underway in the U.S.
So what can exascale offer us comparatively pea-brained mortals, who would need trillions of years to complete calculations that an exascale computer can do in one second? We asked three exascale experts.
What can exascale computers achieve that current supercomputers can’t?
Project director for the Department of Energy’s Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory (ORNL), currently at work on an exascale project for 2021
At exascale, we’re talking about a quintillion calculations per second. I mean, that just starts to lose meaning. Somebody recently put it this way to me: If you fill a swimming pool with sand and there are a billion grains of sand in the pool, a billion pools filled with a billion grains of sand each is how many calculations per second we’re talking about.
The types of problems researchers can tackle changes with exascale, but it also changes the size of the problems that can be solved and the fidelity of the solutions to those problems. We see a lot of scientists bringing in large data sets, using artificial intelligence to make inferences that a human could never make. Exascale opens up opportunities to do that more quickly. It allows scientists to probe problems that were intractable before, or maybe that they couldn’t complete because they ran too long.
Deputy associate director for Science and Technology in the Computing Directorate at Lawrence Livermore National Laboratory (LLNL) and deputy director of the Department of Energy’s Exascale Computing Project (ECP)
Exascale computers provide unprecedented processing power and memory so that researchers can perform very large-scale simulations. The ECP is also investing in 24 application areas that are critical to the DOE mission in basic sciences, applied energy and national security. For example, if successful, our applications will be able to optimize power grid planning, forecast water resources and severe weather and help our nation maintain a safe and secure nuclear stockpile.
For my research, I map brains at a very fine scale, at the scale at which neurons connect. On average, a neuron in the brain connects to about 10,000 other neurons. In the human brain, there are approximately a hundred billion neurons or something like a quadrillion connections. It turns out that I want to image every one of those connections. Even a mouse brain, which is 0.1% the volume of a human brain—that’s still an enormous data set. It’s at least an exabyte dataset. It’s really important to see how an entire brain works, but right now, we can’t do that. We need faster computers and better algorithms. We’re nowhere near collecting an exascale scale data set, either. The state of the art right now is a petascale data set. That’s 0.1% of the exascale.
Once exascale computing is a thing, though, I think there will be a ton of disease applications. I want to compare a brain that has Alzheimer’s with a healthy brain, or a brain that has autism. To do that, I’m going to have to compare dozens, maybe hundreds of autistic brains and neurotypical brains to really understand the physical difference, to ultimately help with the disease.
I’m also interested in what’s so different about an octopus brain versus a primate brain. On one level, the octopus is one of the most alien life forms relative to us. It has no bones, it lives in the water, it has a completely different lifestyle. And it’s clearly really smart. So if we had a map of an octopus brain at the same level that we had a map of a mouse brain or a primate brain, how different would it look? Are there only a couple of ways that brains can achieve smartness? Or are there 10 different ways of designing a brain to make an animal smart? I think we’ll be able to address these kinds of evolutionary questions with exascale.
What drew you to high-performance computing?
Whitt: I started out as a consumer of high performance computing. I was a computational scientist studying fluid dynamics and using supercomputers to optimize different designs. I came to the lab as a scientist, and I became more and more involved with the design and with the operation of supercomputers over time. One thing led to another, and I joined Oak Ridge Leadership Computing back in 2014 as the deputy project director for the Summit Computing project.
Kasthuri: When I completed my Ph.D. and I was on the job market, I’d go someplace and they’d ask, “What do you need?” And I said, “Nothing, I just need to be able to someday analyze an exabyte dataset." Basically, I could go to Google, which some of my collaborators have done for the algorithm side, or go to a national lab like Argonne. But to have a functioning neuroscience lab, I can’t be the only neuroscientist among, like, condensed matter physicists. How do I know what are the interesting topics in my field, the colleagues to talk to, et cetera? University of Chicago manages Argonne, so working here was an almost completely ideal solution.
How is using an exascale computer or supercomputer different from using a desktop?
Whitt: In some ways, you could almost be fooled into thinking that you’re at a super-powered desktop. We have some users that are here at the laboratory, but most of our users access the system remotely. So they’re sitting at home with their laptop, or at work with their desktop.
But there are huge differences in programmability. When you have something this big, you have to learn how to exercise the parallel nature of the computer. You write equations that represent the governing physics from problems you’ll solve, and then you have to have some way of farming out pieces of that to all the different compute nodes.
Kasthuri: Let’s say you want to run a huge chemistry simulation—you have to learn how to parallelize it on high-performance computing. It’s easy to think, oh God, I have 10,000 pictures in my data set, this is going to take forever to analyze. One way to go fast is to take each picture; and give it to a different node of a high-performance computer; and having them all analyze it at the same time; and then return me the results. That sounds simple when I say it in my mind, but there’s a whole bunch of timing issues. When does this algorithm run? How do you allocate memory? It takes a real level of sophistication to take an algorithm and parallelize it to capitalize on a supercomputer’s performance capabilities.
What are some of the hardware challenges involved in making computers that work at exascale speeds?
Whitt: To get exascale performance, the systems themselves have had to become more dense over the time; we’re packing the components tighter and tighter into a given space. The Frontier exascale project, for instance, has a lot of components that folks would recognize from their desktop or their laptop. It has a central processing unit, or CPU, and it has graphical processing units. The difference is that we use the graphical processing units to do computations and calculations, instead of rendering graphics. They are really good at certain types of calculations. So what you’ll find is that we’ll have a CPU coupled with multiple graphical processing units. When you consider the processors that help with routing network traffic, too, this turns into a computer that has hundreds of thousands of processors.
If you look at the average failure rates on any of these components, designing something that’s resilient enough to be reliable, so that you’re not constantly plagued by failures, is another big problem that we have to solve.
Diachin: To achieve exascale computing requires solving hardware challenges at many different levels. First, consider the computer processor. We’ve come to the end of an era in which computing speeds can be doubled every 18 months by adding more and more transistors to the silicon chip. Transistors have become so small that they have reached their physical limits, in terms of electrical current leakage and physical heat generation. To overcome this challenge, exascale computing needs to tap into the latest innovations in microarchitectures, like increasing the number of processing cores.
It’s also getting harder to get enough data to the processing cores to keep them working productively at peak speeds We need innovations in memory architectures and interfaces. The overall system is comprised of thousands of processing nodes with tens of thousands of processing cores. This creates challenges for the overall system architecture too, including, for example, the interconnected fabric between the cores and nodes.
How much power would an exascale computer need, and what factors would its power consumption depend on?
Whitt: These systems use tremendous amounts of power. We’re expecting that ORNL’s Frontier will use 30 to 40 megawatts of power. That’s as much as maybe 40,000 homes would use.
We’re underway with construction here at Oak Ridge National Lab, on a space that will house this computer. It’s about 20,000 square feet. It will also have that 40 megawatts of power. But if I put that much power in, I have to be able to get that much heat back out. The only way to get that much heat out of small space is through water cooler systems.
Do you see any potential drawbacks to exascale computing?
Whitt: I don’t. A lot of the science that we expect to be on this system will involve large datasets, and some of that data is health care data. There are a lot of questions about data protection and protecting people’s privacy. Who owns data?
In that context, there are no negatives associated with exascale. That’s one of the benefits of sensitive data analysis taking place in national labs. You have tremendous physical security and tremendous cybersecurity in place.
Diachin: It’s not really a drawback, but computer hardware is becoming increasingly specialized to get more and more performance from the system. This is true at both the exascale and desktop levels of computing. This means mathematical algorithms and associated software for scientific computing are becoming increasingly complex. It takes teams of experts to handle all aspects of software development, from programming models to data analysis. It takes a strong team to maximize the performance.
Answers have been condensed and edited. Images courtesy of Shutterstock and interviewees.