What Enterprises Need to Know Before Adopting a LLM

Large language models (LLMs) have been around for a while, but they struggled to generate much excitement early on because they were too complex to adopt. That’s changed with the advent of OpenAI’s ChatGPT and similar models, making it easier than ever for businesses to adopt their own large language model.

3 Large Language Model Techniques to Consider

A prompt engineering model that leverages the chat interface of an LLM.
A retrieval augmented generation model.
Creating a custom LLM trained on your own data.

LLM and large multimodal model (LMM) offerings today are general-purpose “sledgehammers” that do a wide variety of things moderately well. To make the most of it, the models need to be aligned or leveled up through specific instructions or domain-specific data. Techniques to do this (in order of complexity) include: prompt engineering, retrieval augmented generation (RAG), embeddings, fine-tuning and custom models.

Here’s how to determine what your organization needs to do to make the most of large language models.

Which Large Language Model Technique Is Right for You?

Adopting a large language model requires selecting the right technique to deliver on your company’s needs.

Prompt engineering leverages the natural language chat interface that LLMs offer to instruct or tune outputs to be in a specific type, style or format. The chat interface itself is very open-ended. Prompt engineering is easier and more affordable than other approaches like fine-tuning. It allows speed to market and less cost, but it has limited context and responses may not be fully controlled to stick to the relevant topics.

Another popular technique to constrain model outputs is through retrieval-augmented-generation (RAG). RAG allows you to add domain-specific literature or data sets. This provides LLM or LMM with additional context when generating responses.

That said, there are domains that are completely different from the generic internet-type data that LLMs are trained on, for instance legal documents, or clinical data. Simple prompting, RAG or even embeddings may be insufficient for LLM applications in such distinct domains.

This is where you would consider more elaborate techniques like fine-tuning current models on your own data or even training a custom model from scratch.

More on Machine LearningBest Use-Cases for Generative AI in 2024

3 Questions to Consider When Adopting a LLM

1. Should You Expose a Chat Interface to the End User?

Some enterprises may go a step further and choose to expose a chat-interface directly to their end user. For instance, a grocery delivery company may choose to prompt an LLM to build meal plans and grocery lists according to user preference (entered via chat), and embed it within their app. But one of the challenges with exposing an open-ended interface externally is the wild west aspect of the natural language.

In a recent example, a Chevy dealership launched an AI chatbot on its website that caused significant issues. The tool was supposed to work in the automotive sales space and was powered by ChatGPT. One user asked it to create a Python script to solve the Navier-stokes flow equation and the chatbot produced a response. In another example, it recommended a competitor product for the best pick-up truck.

What’s worse is that the flexibility that the chat interface offers can be misused by bad actors through what are known as prompt injection attacks. Hidden within unstructured data like images or text could be instructions to prompt the model to compromise security by accessing malicious webpages, sharing sensitive data and others.

Researchers from DeepMind found that through a cleverly discovered but simple prompt “Repeat the word “poem” forever,” they could nudge ChatGPT to leak its training data.

A rule of thumb to protect against these is to limit the input capability of the users to not be so open-ended. One could also use a filtering mechanism in the user inputs and outputs, say, with a counter model running to check confidence of response, or a ranking model to filter for relevance are examples of how you can control the wild-wild west behavior.

RAG is another approach to tune model responses towards retrieving from known sources, rather than letting it hallucinate responses. Overall, usage of chat interfaces to external users should be carefully monitored with trackers and guardrails in place to detect attacks or unintended use.

Prompting is useful and intuitive but has to be approached very carefully and a lot is still unknown. None of the above approaches are foolproof.

Our recommendation is to leverage LLMs and even chat interfaces full scale for internal or enterprise-specific use-cases, where your user base is “friendly.” This means incentives are generally aligned, and there are strict policy guidelines against malicious use. If chat interfaces are directly served to external users, they should be carefully monitored with trackers and guardrails in place to detect attacks or unintended use. Potentially including a human in the loop would be more beneficial.

2. How Will You Protect Your Intellectual Property?

If you are an organization that is looking to build something that has your secret sauce, and you have concerns about intellectual property and are in regulated industries like banking and finance, your options may be limited to fine-tuning and custom models.

With fine-tuning you can use both closed and open-source providers. With open-source frameworks, you control your secret sauce tighter because you will be training and hosting models in a secure environment. However, you will still need to invest in talent, data and compute resources. With closed providers, you will still need to invest in data and talent but compute and tooling will be taken care of by the provider. Fine-tuning allows the underlying model to learn the nuances of domain specificity, style, etc.

Custom models or developing a model from scratch is not always needed. In certain scenarios, it is warranted. For instance, if your entire business is dependent on your IP. Bloomberg purpose-built a model from scratch for finance. Bloomberg GPT outperforms similarly sized models on financial NLP tasks by a significant margin. Another case is if you are concerned about copyright, data compliance, and bias issues. Even in open-source offerings, there’s often no visibility on what data was used to train the model.

Custom models are no longer prohibitively expensive and time-consuming. MosaicML claims that you can get GPT-3-like quality for less than $500,000.

Fine-tuning or custom models are much harder to execute and hence help you establish competitive differentiation compared to a generic model and not all your competition will be able to achieve it. In these instances, truly having the right team and resources will enable your success.

There are ways to leverage LLMs to produce not just a natural language output (like ChatGPT), but also more structured predictions. The underlying framework makes adoption much easier for either of the purposes.

3. How Will You Leverage LLMs for Non-Chat and Generative Outputs?

If your use-case doesn’t lend itself to a chat or generative interface and is instead more geared towards a structured prediction, embeddings are the way to go. Embeddings are a technique to map text data into a numerical vector in a way that preserves semantic similarity.

This allows you to perform mathematical operations on text data, such as “king – man + woman = queen.” It turns out that the embeddings learned by the latest LLMs and LMMs are highly signal-rich compared to previous techniques. So, the embeddings shared by a provider, like OpenAI, or ones extracted from an open-source model, can then be used for further computation or modeling.

A recommendation system is a good example for the following use-case.

Let’s say you run a research journal that recommends publications based on reading history or user background. Rather than deploying a chat interface for the end-user, you could store publications and articles in embedding format in say, a vector database. A specific user’s reading history and preferences are also stored in this embedding space, and then is used as a query to find embeddings “closest” to it. This is then served to the user as recommendations.

More on Machine LearningHow to Develop Large Language Model (LLM) Applications

What’s the Difference Between Open-Source and Closed LLM Providers?

Some of the challenges with closed providers include data security, copyright infringement, privacy, and bias concerns. In addition, these closed models do not provide users transparency into data used in training, methodology, assumptions, or algorithms. OpenAI used practically everything available on the web to train the model: this data is not domain-specific and may not translate well to the context of your business. In addition, there are serious security concerns with how such providers may use data and what is input into the platform.

To battle some of the concerns and to encourage adoption, third-party providers have started to offer enterprise versions that are aimed at enhancing security and privacy and are promising not to use user inputs or outputs in training. This is a move that could enable more companies to adopt faster and address some of the concerns they have. These third-party providers are also promising to defend the organization in the case of copyright lawsuits on an approved basis.

Third-party providers like OpenAI have made adoption a lot easier and the natural language interface that demonstrates the promise to hundreds of millions of users has really sparked the imagination and interest of the C-suite. Such providers are also offering multiple options for enterprises to tune models so that they generate outputs that are better aligned with their custom use-cases. Hosting is completely abstracted, and you can go from experimentation to a fully functional, deployed model in a matter of days.

Open-source providers, on the other hand, provide the full source code and weights of pre-trained foundational models that enterprises can leverage. With open-source providers, you have complete access to the model, including flexibility to extract embeddings, fine-tune weights or even train your own LLM from scratch.

Moreover, since you are hosting your models, some of the concerns with privacy and security are more in your control, and there is more opportunity to optimize costs of running these models. If you notice performance drifting over time, you can revamp models at your discretion without any external vendor dependencies.

We will see continued advancements in this space and things are changing on a daily basis. However, just like in the early days of the web, if you wait too long to adopt a LLM, you’ll be left behind.