Science Needs a Software Upgrade
Science is messy. That’s why there’s some truth to the cliché of the scatterbrained professor. Not all scientists are like that, of course, but new theories and ideas are often birthed from downright chaos.
At the same time, many scientists are very clean and diligent when it comes to everyday tasks like keeping their desks tidy or their email inboxes uncluttered. This may sound paradoxical, but it really isn’t. All that mental mess needs to be managed somehow, and orderliness is a good strategy for keeping it contained within clear boundaries.
They set those boundaries by insisting on tidiness for anything that’s related to, but not directly part of, the mess that is science. That includes emails or the classes they teach, but also the experimental apparatuses they employ and any other tools they’re using.
It seems self-evident that this dogma of cleanliness should also apply to the software they write. That, however, is not the case. In particle physics and cosmology, where I work, messy code exists wherever you look. It’s usually poorly documented, contains hardly any tests, and older software often isn’t even version-controlled.
As I will show below, this phenomenon isn’t unique to my field. Quite on the contrary, it’s commonplace all over academic fields in which scholars write code. Software in academia is very different from its industry counterpart, and with good reason. Despite these legitimate differences, however, compelling reasons exist to change how code is currently written in the scientific world.
Industry Versus Academia
Scientific computing is much more focused on prototyping than application development. Scientists usually build software with something specific in mind that they’d like to learn more about. When they’re done learning and have published their findings, the software is only useful for legacy and reproducibility purposes. As a result, a lot of scientific code is used for just a few analyses and then abandoned.
This procedure stands in great contrast to software’s role in industrial applications. In industry, software is a consumer product. It’s built to be sold, and customers are expected to use it again and again. For example, consider a newly developed accounting software. Once it’s sold, the users will put it to work every time they have some accounting to do. Conversely, the simulation software for Higgs bosons is quite obsolete once researchers know more about these particular bosons because they’ll move onto other research questions.
Apart from speed to obsolescence, another fundamental difference is that industry software often gets distributed to a wide user base with varying degrees of technical expertise. Academic software, on the other hand, is rarely used by anyone but the scientists themselves. Although it might seem simple, this distinction has far-reaching consequences, which I will address below.
These differences aren’t set in stone, though. In science, some code lives for a long time. One example — and there are many more — is multi-purpose simulation software in particle physics, with which users can make simulations for any exotic particle they can possibly imagine. But since most code in science isn’t written with longevity in mind, even the codes that are hailed by academics don’t often live up to industry standards.
The Academic Publication Economy
The number of publications that a scientist has written is a widely used but often hated-upon metric. Although many scientists would like to see it differently, a “publish-or-perish” mentality pervades many institutions. In other words, a scientist who doesn’t publish papers regularly has few chances of getting a respected position in academia. This culture in itself doesn’t explain why scientists don’t invest in clean code though. The real problem is rooted in the way that academic journals handle the papers.
The scientific content of every paper gets peer-reviewed, meaning a panel of experts independently evaluates the rigor and accuracy of its claims. That process only applies to the scientific part of the article, however. Even though scientists often include descriptions of the software that they’ve written, the code itself rarely gets reviewed. Often, the peer reviewer needs to trust the author that the software works correctly because the code is rarely open source. As a result, the review process doesn’t exercise oversight on coding.
Of course, authors and reviewers aren’t dumb; they know that software is rarely perfect and needs to be tested for bugs. The problem is that academia offers no incentive to do additional testing or to document and make code open-source because little funding exists for that purpose. Scientists can apply for grants to finance their research, but seldom for the software they write.
As a result, from a scientist’s perspective, there is no reason to invest in cleaner code. Doing so will neither increase their chances of getting better reviews on their papers nor provide access to more grants and better financial compensation.
The Shaky Status Quo of Code in Academia
All of the above explains why software in science is the way it is. Since code typically doesn’t live very long, the number of users for any particular software is very limited. Often, just the main developer and one or two collaborators ever even read and use a piece of code. This structure explains why scientists — sometimes rightfully so — don’t see any need for extensive testing, writing proper documentation, or open-sourcing the code.
The problem is that scientists aren’t perfect, and bugs inevitably sneak into the work. And if not even a peer-reviewer has glanced at the code, how can anyone be sure that it is doing what the scientists claim it does?
In fields like mine, but also in meteorology, economics, the life sciences, and many more, breakthroughs would be impossible without efficient software. But if nobody ever tests the software in its entirety, scientific findings often stand on shaky legs.
Solutions for Practicing Scientists
The problem of sloppy code in academia boils down to an issue of culture, not to the laziness or incompetence of individual scientists. Still, it’s in the interest of every single scientist to make sure their code works properly. After all, who would want to retract a paper because the underlying software was later found to be buggy?
One way to validate scientific software is to compare the results to those that have been produced with similar software. Given the short lifespan of lots of software in science, this is a relatively easy method to build confidence in self-authored code.
I support this method, but it shouldn’t be the only way to check for code smells. If a deviation occurs, scientists can’t determine whether the exact location of the bug is in the new program or the one under comparison. I myself have spent weeks searching for bugs in my software because it didn’t produce the results that another one did, only to find that the other software was faulty and not my own.
Automating unit tests is another possible technique that only takes a few minutes at a time and that you can implement as you go. This way, one can easily check for duplicated code, misused variables, or logical breaks. If scientists are stuck on what else they should write tests for, considering all cases in the user manual is the right way to go. And yes, they should write a manual, even if it’s short! This document becomes useful if other scientists decide to work on the code and need a quick introduction or for helping the peer-in-review when submitting a paper for publication.
One study of a well-established hydrology software found that only about 80 percent of the user manual had been unit tested. In addition, only about half of the code base contained tests. These numbers would probably be even lower in less-established codes.
One way to write robust software quickly is test-driven development. In this method, each requirement is turned into a test case. Then, a piece of code is written for the first case, and successively gets improved until it passes the second, and then so on, up to the very last test case. This approach forces developers to think about tests and robustness from day zero without compromising on development speed.
Scientists should also go beyond unit testing and consider system, integration, acceptance, and regression tests. One literature review found that 70 percent of all papers didn’t use more than one testing technique, even though different techniques make software more robust. To ensure the reproducibility and integrity of scientific findings, however, the underlying code needs to be thoroughly tested.
Solutions for Scientific Institutions
Scientists can and should play a large part in reforming this situation, but a great deal of the weight needs to fall on institutions. Research labs, universities, scientific publications, and government agencies alike need to shoulder their part of the responsibility for fostering a better culture around scientific software. One clear objective is to incentivize scientists’ writing clean code. Even if the lifespan of most scientific software is short, we cannot let groundbreaking science rely on unreviewed code.
We need to embrace the fact that code is often an integral part of scientific discovery. As such, it deserves the same level of scrutiny as the scientific part of a paper. Therefore, journals should peer-review any software that leads to scientific results. Of course, this review should not evaluate the elegance or innovative nature of the code, but rather make sure that it’s fault-proof and the derived results are reliable.
In addition, government agencies and any other institutions that issue grants must make software an explicit component of a grant. Even better, they should distribute special grants focused on software in science. Doing so would add an additional incentive for scientists to write clean and robust code since it might have a more tangible influence on their careers.
Signs of Hope
Compared to where scientific code was a decade ago, academia has made a huge leap. Even though the state of the art is far from perfect, there are signs that science is heading in the right direction when it comes to code.
The Journal of Open Source Software, for example, is a publication that encourages researchers to share their software packages with everyone. Not only does open-source code make the scientific process more transparent and reproducible, but it also forces researchers to write proper documentation and test their software for various use cases since they don’t want their inboxes filled with dozens of questions about their program.
Where funding is concerned, the Association for Software Testing is going in a positive direction as well. It organizes conferences and actively encourage scientists to do more software testing. Even more helpfully, it also adds financial incentives to write better code by issuing grants and scholarships.
In a way, it’s sad that academia needs these kinds of initiatives to move forward. The problem that scientific breakthroughs often rely on unreviewed code is a pretty obvious one. Indeed, the issue is so obvious that publications and government agencies should be aware of it by now and trying to fix it. They are. But oftentimes, they’re doing it slower than they should.
The Bottom Line: Science Is Messy, but Scientific Software Doesn’t Have to Be
Science is a messy process. As a scientist, some days I leave the lab at night thinking that I have it all figured out, only to discover the next morning that I’d ignored something important in my calculations that shifts my understanding of the problem completely. These changes don’t happen in a linear or plannable manner, though. One day I’m stuck, another day I move forward at lightning speed. That’s the way it is, and it’s fine that way.
Working with scientific software, however, doesn’t need to be as messy as science itself. Quite the contrary: robust, extensively tested, well-documented, and open-source software should be part of high-quality, replicable science.
This requirement isn’t just for software aficionados. It’s understandable that scientists view software as a tool, not as their end goal. It’s also understandable that the field is the way it is right now, given the scarcity of incentives for writing good code. But as long as major scientific breakthroughs rely on computer code, that code must be reviewed like every other component of a paper.
Academia is moving in the right direction in that respect. But it still has a long way to go.