My journey with AI began by exploring how I could leverage ChatGPT to assist with my coding. You can read all about my initial learnings. Stepping into the world of AI, I quickly grasped the enormous potential of domain specific chatbots. These advanced tools are not just about automating conversations; they open doors to a myriad of applications including revolutionizing customer service, increasing the productivity of developers, and personalizing user experiences to a remarkable degree.
Background
In this project, my team and I crafted a smart AI chatbot, specifically tailored for the Focused Labs website. Our aim extended beyond just devising a simple chat interface; we envisioned a comprehensive “Knowledge Hub” that fully leverages the potential of natural-language models. Such an AI-powered Enterprise Knowledge Hub uniquely integrates diverse information sources into a unified, readily accessible platform. Our goal was to synergize data from our Notion wiki and information from our website, thereby establishing a comprehensive Focused Labs virtual assistant.
So, here, I share the top six insights I uncovered during the creation of a domain-specific custom AI chatbot.
1. Interplay of Multiple AI Models
My key insight is that building a smart AI chatbot is not about relying on a single, all-knowing AI model. Instead, it's like conducting an orchestra - different models collaborate, each playing their unique part to create a symphony of precise responses and meaningful interactions. Our setup comprises three distinct models, each with a specific role. An embedding model processes our proprietary data, a completion model handles the text we retrieve from databases, and a chat model directly engages with the raw input provided by users. Together, these models form a more effective and comprehensive system. For a deeper understanding of this architecture, refer to our blog, Basic Architecture of a Domain Specific Custom AI, that covers this topic.
2. Vector Embeddings: The Heart of Custom Data
Vector embeddings are critical when dealing with custom data. In fact, these embeddings essentially form the backbone for AI models to comprehend proprietary information.
Traditional "training" or fine-tuning of AI models involves a complex and expensive process of exposing the models to vast quantities of domain-specific data. However, this approach doesn't reliably deliver the specific, accurate responses we're seeking.
This is where “prompt engineering” proves its worth. Unlike the hefty requirements of model training, prompt engineering simply involves specific crafting of the input prompts provided to the AI. By providing a well-defined context or prompt, we can guide the model's responses effectively.
Unfortunately, we are limited in the number of words we can ask an AI to ingest at one time. Thus, we are not able to include all of the potential domain-specific context needed for a good quality answer directly in the prompt. So, we pre-organize and store the data in a way that helps the AI model to grasp the meaning and context of words in human language. We are then able to retrieve smaller pieces of relevant information from the data stores and add those specific pieces to a prompt for a language model to process.
These pre-organized data stores are called vector databases. Think of these databases to be like a library, but instead of organizing books by their exact titles or authors, the books are organized based on their style, themes, or the emotions they evoke. The “books” are the raw text together with a computer-readable representation of the data called a vector embedding. Vector embeddings are like detailed book summaries that encapsulate complex descriptions, making it possible for the library to categorize and recommend based on similarities beyond just title or author.
In more technical terms, vector embeddings are lists of numbers that represent real-world concepts. These numbers quantify various characteristics of the data. Then, we can measure the distance between vectors in the vectors space to evaluate similarity. I recommend reading this Pinecone article for a deeper technical definition. (Pinecone is a vector database.)
Vector embeddings with prompt engineering are both simpler and more effective for our use case, helping us to generate accurate responses without the complications and costs associated with traditional model training. To see practical examples of prompt engineering, check out this blog post about Small Steps Toward Effective Prompt Engineering.
3. The Unique Challenge of Testing
Shifting from traditional software testing, the evaluation of an AI system demands a new perspective. Testing involves a more holistic approach rather than segmenting the system into smaller, separated parts. Classic software testing methods, such as unit testing, aren't up to the task. Instead, the testing process more closely resembles baking a cake: the impact of altering a single ingredient can't truly be evaluated until the entire process is complete and the final product is assessed.
Following the time-honored scientific method is a better testing pattern. Create a rubric that records a hypothesis, details the data input, and documents the methods, algorithms, libraries, and tools used. Subsequently, evaluate the responses from the AI system, draw conclusions, and encapsulate the findings.
For our use case, we created 2 sets of 6 questions based on information available on the Focused Labs website and the Focused Labs Notion wiki. We included both specific and broad questions and created an answer key. We ranked each answer from our chatbot on a scale of 1-5 for correctness.
When evaluating your AI system, be prepared for it to sometimes give partially correct answers. For instance, we asked “What are the Focused Labs’ values?” The chatbot correctly answered “Listen First”, “Learn Why” and “Love Your Craft.” However, the model also returned two additional concepts as values. We’d rate an answer like this a 4. The LLM had the right answer, but it gave additional information.
We would then make small iterations in our implementation. For example, we removed all emojis from our text, and then asked our chatbot these same 6 questions and compared the results.
While this method is repetitive and monotonous at times, it clearly tests the effectiveness of leveraging a LLM in bespoke code.
4. Think Less Like a Programmer and More Like a Linguist
To improve accuracy, efficiency, reliability, and quality of insights, we explored various data cleansing techniques. One specific technique includes normalizing the data by converting all words to lowercase. After presenting our routine set of questions to the AI virtual assistant and comparing results with previous experiments, we were surprised by the outcome. This seemingly insignificant change leads to a substantial decrease in the AI's performance. This unexpected result pushes us deeper into the nuanced complexities of data interpretation, prompting a shift in perspective.
Instead of adhering to a traditional programmer’s perspective that perceives data as mere strings of characters or simply 0s and 1s, learn to value the significance of elements like casing, punctuation, and even emojis. These might previously have been dismissed as trivial, but they play crucial roles in communication. Embrace a linguist's mindset and critically evaluate which parts of the data contribute to the conveyance of ideas and thoughts.
Pivot towards techniques that promote a stronger connection between ideas. Remember, data cleaning is not a mundane process of managing uniform bytes. Rather, it's an opportunity to enrich the ideas and the meaning inherent in the information, ultimately enhancing the accuracy and effectiveness of your AI chatbot. Treat data not merely as raw material, but as a rich medium for nuanced expression and connection.
5. Harnessing the Power of Langchain/Llama Index
Langchain and Llama Index are the leading tools for leveraging Large Language Models (LLMs). They are frameworks allowing language models to link with diverse data sources and engage with a multitude of tools or resources. Lanchain and Llama Index offer modular and easy-to-use components with off-the-shelf configuration.
To dive deeper into one of these technical components, Langchain agents-with-tools are instrumental for grappling with a variety of question types. These agents provide the flexibility to handle a diverse range of queries, thereby augmenting the chatbot's versatility and enriching user experience.
During configuration of these agents, a key piece of advice is to pay close attention to the descriptions of your tools. These descriptions significantly impact your chatbot's behavior and subsequently, its effectiveness. Include details for each tool's intended use.
For instance, in our initial setup, the descriptions for our two data sources were too brief, which led to frequent misapplications of the tools and consequently, incorrect answers from the chatbot.
Inadequate Tool Dependencies
Notion Data Source Description: "Focused Labs internal knowledge from Notion."
Website Data Source Description: "Focused Labs knowledge scraped from website."