How to Build Safer and More Reliable Software

Innovative software continues to disrupt society as we know it, including solving big problems, squeezing out new efficiencies and creating a better quality of life.

4 Common Software Issues

Connectivity issues
Usage of incompatible units of measurement
Security vulnerabilities
Poor change management

More and more major businesses and industries are run on software, from movies to agriculture to national defense. Healthcare also benefits from software-fueled mobile and AI technologies to accelerate innovations and address growing physician and other healthcare staffing shortages.

Yet bad software can cause irreparable harm across the very industries and infrastructure it’s transforming. McKinsey estimates that 70 percent of medical device recalls are due to a software issue.

The cost of software recalls can be significant in terms of financial losses and damage to a company’s reputation, and can also cause physical harm and even death. Some recent examples include:

An automaker’s recall of more than 360,000 vehicles in February stemmed from issues with the company’s self-driving auto software.
An airline’s December 2022 meltdown that left tens of thousands of stranded travelers was largely due to a legacy enterprise software system from the 1990s.
Medical device recalls in 2020 alone ranged from insulin pumps to software-operated devices responsible for monitoring heart and brain activity.

How can companies reduce risk and ensure that their software is safe and reliable? I believe the answers can be found in a strong application lifecycle management (ALM) system that includes continuous testing and change management. But first, let’s look at some of the most common software issues.

Read More About Software EngineeringWhat are Code Smells? (Examples With Solutions).

Connectivity Issues

According to the U.S. Food & Drug Administration, connectivity or interoperability between different systems is one of the biggest issues. The easy exchange of information between applications, databases, and other computer systems is crucial for the modern economy. Yet all too often, systems are developed by different teams which encounter challenges when transferring information back and forth and having to verify and validate the overall system.

A recent recall of an infusion pump illustrates this point. Patients in an ICU get connected intravenously to the infusion pump. The insulin pump has a companion app used by a physician on either a desktop or tablet that regulates the disbursement of the prescribed dosage at the appropriate times. However, in one recent case, the manufacturer updated the companion app without making a similar update in the device software itself. As a result, patients were at risk of over-, under- or mixed-dosing.

Using Incompatible Units

Other common issues result from the unintended use of a different metric or unit coupled with poor communications between teams. In 1998, the Mars Climate Orbiter, built at a cost of $125 million, famously crashed because of a discrepancy in units of measurement. The navigation team at one manufacturer involved in the orbiter’s production used the metric system of millimeters and meters in its calculations while another provided crucial acceleration data in inches, feet and pounds. The spacecraft was effectively lost in translation.

While this happened many years ago, time hasn’t resolved the issue. Different teams own different pieces of software and when they inadvertently use different metrics, poor communications can turn a simple disconnect into a catastrophic defect.

Security Vulnerabilities

The FDA is increasingly concerned about security vulnerabilities and the possibility that an insulin pump or pacemaker could be used by a hacker as an attack vector to launch an attack on a person, hospital or overall network. Vulnerabilities may encompass everything from unpatched backdoors to dead code that is still in a system for historic reasons and develops the vulnerability.

Poor Change Management

Many software issues occur as a result of changes to the system. One issue is the level of access that needs to be coordinated between groups, and the chance that someone in that group will make an error. With modern web-based, mobile and AI/ML software, changes are an even more frequent part of the software lifecycle, increasing the risk of errors.

A second set of security vulnerabilities and/or performance issues can result from usage of SOUP (software of unknown provenance), including open-source code dependencies. To accelerate bringing products to market, many software teams use off-the-shelf software or SOUP so they don’t have to reinvent the wheel and can focus on their core competencies. In some cases, software developer teams may switch their SOUP because they were able to identify a vulnerability and/or another team started using a newer version.

If developers don’t test throughout the process and push code to production, they can end up with a scenario where they thought response time was .01 seconds when it was actually .015 seconds, wreaking havoc on the device. Change is a good thing in identifying and fixing problems but needs to be managed effectively across different systems and applied consistently.

More Software Engineering PerspectivesWill Low- and No-Code Platforms Steal Developers’ Jobs?

How to Reduce the Risk of Recalls and Defects

Building safer software starts with reliance on better systems, including the use of a robust, connected ALM that allows you to:

Work from the source

Manufacturers today spend more time documenting work than on building quality products. They work in Jira, for example, record evidence in another system and approve the work in yet another third-party quality system. This process is not only inefficient, but can cause issues in the post-market surveillance of a product. In the event of a complaint, someone needs to go through each system to identify and isolate where the complaint originated. These steps take time — time that could be better spent on protecting against issues that result in recalls in the first place.

When you work and generate evidence from source systems, you are able to create a single source of truth for the entire team to collaborate on and reference. This approach removes silos while reducing project timelines.

For example, you can generate your Software Bill of Materials directly from your source code and perform risk analysis based on the code you actually have in your applications. If someone changes the code, you are automatically notified and can assess if the risk needs to be re-examined.

Shift left design, testing and validation

Shift left refers to the practice of moving testing, quality and performance evaluation early in the development process, often before any code is written. In this same vein, one idea attracting attention is model-based systems engineering. This approach creates models to test that a system works computationally through numeric simulation before you actually build and deploy it.

While this method may not be for all manufacturers, some of the basic principles still apply, especially when you are using many moving parts and components in one software solution. Test as early as possible, test faster and perform simulations as much as practically possible before putting a system into production.

Strive for rapid change

It’s critical to have the ability to update systems very quickly to address issues as they are discovered. Identifying and immediately mitigating cybersecurity issues, for example, is key to safer software. Ensuring the right systems are in place to support the rapid change management necessary for leveraging cloud systems as well as AI and ML are also a must.

Use Existing Work to Maintain software updates

While it may seem strange to lecture software companies about the importance of maintenance, it’s important to keep up with software updates such as security patches and other maintenance releases.

In the last 10 to 15 years, we’ve seen an immense amount of automation in simplifying certain types of work such as cloud-based dev tools and code to connect infrastructure to the internet. All of this existing work frees engineers to focus on writing their unique differentiated code in the form of life-saving new products. Without reliance on third-party software, modern software systems can’t happen rapidly and reliably.

As our dependence on software grows, we must build trust in software by ensuring it is safe and performs as expected. Every day, the products and services we provide become more complicated. As growth in population outpaces the growth in training new employees in all facets of the economy, building safer software will not only help automate certain repetitive and even dangerous human jobs, it can save lives.