If you thought 2024 might be the year discourse slows down on GenAI, well, I have bad news. No one’s reversing course on AI. It’s here to stay, and we need to work with it. Most software developers today already know this.
It’s just that AI doesn’t always work well with developers.
One of the greatest challenges developers may likely face in 2024 is how to avoid leaning into bad GenAI habits that will make them worse programmers. How will they do this? The first step is to take the “large” out of the large language models (LLM) because, for serious and sensitive commercial enterprise coding, broad-purpose LLMs may just be growing a little too big.
LLMs and Coding
Although large-language models offer software developers many useful tools, they may introduce some unintended problems. Because they draw from such large pools of data, you could accidentally introduce copyrighted or flawed code into your product. Coders would be wise to employ smaller, more fine-tuned models in their work.
When LLMs Become Too Big for Coding
Researchers from the University of Washington already questioned the growing size of LLMs years ago. Few, though, could deny the alluring promise of LLMs like GPT-4 for making programmers more efficient. Who would say no to faster time-to-market? The thought of developers transforming themselves into architects rather than coders is tantalizing. Plus, tools like ChatGPT are fantastic mentors for helping young coders get up to speed with programming fundamentals.
But for all their wonders, mainstream LLMs today are like giant, digital Hoovers, indiscriminately sucking up just about everything on the web. They’re not exactly transparent about where they’re sourcing data from, either — a huge source of trust issues. If you don’t know where a model is sourcing its data from, how do you know you’ve not accidentally ended up with copyrighted code or that the code is even good in the first place?
You don’t want your company launching a WiFi-enabled coffee machine, for example, only to find out six months later that some of the code generated for it is very similar (if not the same) to copyrighted code from a completely different organization. This can happen naturally when humans write code from scratch, but the chances seem to be higher when using GenAI. If even 1 percent of the code is dubious, that’s a concern. And if your product doesn’t have over-the-air updating capability, you’ll have to recall it. That’s not going to be a good day for anyone.
With that in mind, you can’t blame enterprises for their discontent with generative AI’s progress. About two-thirds of C-suite executives in a recent Boston Consulting Group poll expressed that they are less than satisfied with GenAI. In my own conversations with customers we work with at Qt, I’m hearing more and more people express concern about building their products with closed-source GenAI assistants. The models simply aren’t giving development teams the tailored, quality answers they need when inputting questions into the chatbots.
Some have turned to prompt engineering to refine results, but that’s hardly the only viable option. If anything, prompt engineering is a highly tedious and time-consuming process. Plus, the cost of a dedicated prompt engineer may outweigh the benefits — last year, some reported salaries as high as $300,000.
No, there’s a more cost-effective solution, and the answer lies within more specialized models.
Coders Should Look to Smaller Models for AI Assistance
Large language models aren’t the only way to succeed in AI-assisted code generation. We’re seeing increasing momentum for smaller, more focused LLMs that specialize in coding. The reason? They’re just better.
There are already lots of options on the scene, from BigCode and Codegen to CodeAlpaca, Codeium, and StarCoder. StarCoder in particular, despite being way smaller, has been found to outperform the largest models like PaLM, LaMDA, and LLaMA in terms of the quality and relevance of results. The fact that a smaller model’s fine-tuning is outperforming that of its bigger and more mainstream peers isn’t surprising because it was tailor-made for coding.
We will likely continue seeing more vendors trying to compete against bigger LLM companies by creating these smaller-sized, hyper-focused models, including across industries, from medtech to finance and banking, and more. Whether they will all be as good as OpenAI’s offering is debatable.
From a coder’s perspective, however, they will probably be much safer by avoiding the leakage of unsecured or legally sensitive data if the pool they draw from is considerably smaller. And it makes sense: Do you really need your LLMs chock-full of extraneous information that doesn’t benefit your code writing, like who won the Nobel Prize in literature in 1952?
Hyper-large LLMs like OpenAI’s GPT4 are great at giving technical consultancy, such as explaining code, or why or how to use certain programming methods. None of this advice ends up directly in your production code, however. For generating code that you deliver to customers, you might want to opt for dedicated, smaller models that are pre- and fine-tuned with trusted content. Either way, 2024 will likely be the year developers start carefully scrutinizing which LLM they use for each task.
DevOps teams would therefore do well to thoroughly research all the options available on the market, rather than defaulting to the most visible ones. The smaller the data pool, the easier it is to keep things relevant to the work of coding, and the cheaper the model is to train, too. The rise of smaller language models may even incentivize providers of LLMs to improve transparency.
Suit the Tool to the Task
No GenAI tool (like ChatGPT) is a substitute for real programmers; they can’t be relied on as a foolproof solution for cranking out high volumes of code.
That isn’t to say GenAI won’t transform the DevOps landscape in the years to come, but if there’s a future where GenAI eliminates the need for human supervision, we’re not anywhere near it. Developers will still have to treat every line of code like it’s their own and ask peers the same question they always should: “Is this good code or bad code?”
But since we will inevitably have to work closer with AI to meet the world’s growing software demands, we should at least make sure AI works for the developers, not the other way around. And sometimes that will mean looking for an LLM that isn’t necessarily the biggest — or the most popular — but the one fit for the coding task at hand.