Bestow has used traditional predictive algorithms for quite some time, building and deploying machine learning models to predict the purchase propensity patterns of our customers to identify the best insurance product for our customers and to reduce underwriting costs for our partners.
3 Things We Learned Building This Chatbot
- Stay curious. Rolling up your sleeves and putting your hands on the keyboard is the best way to understand what is available to advance your business.
- Use what’s available. Dig into what cutting-edge technology is available; don’t reinvent the wheel if you can avoid it.
- Keep going. The possibilities are limitless, and it’s fun to see what new iterations you can pull together.
As innovation in generative AI (OpenAI’s ChatGPT, Hugging Face, Google’s Bard) picked up this year, our chief product officer challenged engineering leadership to create a chatbot that uses generative AI. Challenge accepted! With the advent of large language models (LLMs) and my long-standing interest in AI, the timing was perfect for me and my team to roll up our sleeves with generative AI. Here’s how we did it.
Determine Practical Applications
Like any company, Bestow also deals with long documents, whether in engineering, underwriting or marketing. We felt it would be exciting and potentially efficient to deploy the power of LLMs in a company-specific context.
While we admired the power of ChatGPT, we realized that its UI did not allow us to create a custom context based on our documents. After surveying the best opportunity to deploy generative AI at Bestow, we decided to enable employees to “chat” with their documents. The user would upload a document and the app would allow them to ask the document any questions. My team and I realized ChatGPT did not offer this functionality out of the box, and there were possibilities of practical implications for Bestow.
Find the Right Tools
First, we had to create the context in which the LLM would work. There is a way to do this on the ChatGPT UI by specifying the context in plain text. So we started researching tools to create context based on our documents. Enter LlamaIndex.
We built the initial version of the application using Streamlit for the front end and LlamaIndex for the back end. This first version allowed the user to upload a document and ask questions about that document. This initial POC impressed us with how fast we could bootstrap this application. Using a limited understanding of neural networks and average software engineering skills, we created a chatbot that answered questions based on a document the user uploaded, all in about a month. Pretty cool!
Stay Curious About Further Possibilities
Building these initial features and meeting our chief product officer’s challenge energized us, so we decided to build on the foundations with additional applications.
To determine where to go next, we once again got curious. After asking ourselves how to make this technology more relevant and valuable for the life insurance industry, we landed on enabling multi-document queries.
To implement this, more was needed to run everything in memory. So, we needed to introduce a persistence layer to store different documents. Research uncovered a lot of buzz around Pinecone, so we hopped on the bandwagon. Pinecone’s free tier served our purposes well enough, enabling a single index for the application while adding more vectors, a.k.a. documents, to that index. Voila, we enabled querying across multiple documents.
From there, it was hard to contain our curiosity. What if multiple users could create their contexts to query? We introduced the concept of multiple users to the application. This feature worked well, except we then needed to stand up a Pinecone index per user to store the user’s documents. That would have been too expensive for a hobby project, so after a bit of research and some experimentation, we decided on MongoDB to load the documents. Here is how we accomplished that:
parser = SimpleNodeParser() nodes = parser.get_nodes_from_documents(documents) storage_context.docstore.add_documents(nodes)
Once we met the challenge issued by our chief product officer, we accepted that it would be only on late nights on weekdays when we may have time to foster further generative AI projects. Then, Google hosted a GenAI Tech Day for Google Cloud Platform that resulted in the formation of a new generative AI guild at Bestow. We have now opened up new possibilities for GenAI at Bestow with the news that VertexAI honors the same data governance, privacy and security policies as the rest of the GCP platform.
This meant we could take this side project and use it to analyze massive documents at Bestow. For the initial implementation, we took the initial application and adopted the bells and whistles GCP’s VertexAI had to offer. For the initial version, we swapped LlamaIndex’s vector store for that of langchain’s, and we started to use VertexAI’s embeddings.
embeddings = VertexAIEmbeddings() Index = VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch, embedding=embeddings).from_loaders([data])
This method turned out to be slow. Luckily, Bestow’s data scientists and data engineers quickly improved our initial implementation and worked to speed up our document processing and question-answering. We then set up a multi-step mechanism that uses VertexAI’s LLM:
vertex_llm_text = VertexAI(model_name="text-bison-32k", temperature=0, request_parallelism=50, max_output_tokens=8000)
We create a context for every page of the uploaded document. We execute the same question prompt for every page of the document and combine the answers using a combined prompt, using map-reduce. This approach helps us retrieve an answer from each page of the uploaded document and cites the source of the answers.
For now, this tool helps Bestow employees in specific disciplines be more efficient and gives those of us at the company a place to apply our passion for GenAI.
As with any moonlighting project, passion for the subject matter is what drives innovation. Bestow’s GenAI Guild rolled out this feature mainly in their own time, working nights and weekends. We are proud that internal users use our product, and we hope that we will soon be able to include our tool to chat with documents as part of our enterprise offering.