Enhancing AI Apps with Streaming: Practical Tips for Smoother AI Generation

Introduction

Interested in building an AI-powered app using generative models? You're on the right track! The realm of generative AI is brimming with untapped potential. A popular approach is to prompt the AI to generate a list of ideas based on provided context, allowing users to select their preferred option and then refine and expand.

An example of a generative AI app is one my team and I created for copywriters to seamlessly integrate storytelling into marketing emails. This user-friendly wizard elevates email creation, combining the power of generative AI with the user's own expertise and creativity for impactful results. Our development journey led to the integration of streaming technology, significantly reducing AI response times.

In this post, I will cover 2 main tips to enhance your user experience in AI-driven apps through effective use of streaming. Let's dive in!

Tech Stack

Backend: Node.js Typescript with Express
Frontend: React Typescript
AI Integration: OpenAI's Node SDK and GPT-4

Let's start with the basics.

First, send requests to OpenAI leveraging their Node.js SDK. Due to our prompt, the AI response is a list with bullet numbers like “1.”. We use the bullet number format to parse the separate options.

import OpenAI from "openai";

require("dotenv").config();

const openai = new OpenAI({
    apiKey: process.env.OPEN_AI_API_KEY,
});

const chatModel = "gpt-4";

export const createIdeas = async ({ occasion }: { occasion: string; }) => {
    const completion = await openai.chat.completions.create({
        messages: [
            {
                role: "user",
                content: `I want to host an event for the      following occasion: ${occasion}. Write me a list of 4 separate ideas for this event`,
            },
        ],
        model: chatModel,
    });

    let choiceElementElement = completion.choices[0]["message"]["content"];
    return { ideas: parseIdeas(choiceElementElement || "") };
};

// Use the bullet number format (ex: "1.") to split the ideas into individual elements in an array
const parseIdeas = (text: string): string[] => {
    const message = text.split(/[0-9]+\./gm);
    const messageSliced = message.slice(1, message.length);
    return [...messageSliced];
};

Then, let’s return the parsed ideas to the frontend. This example method leverages Express.

app.post("/generate-ideas", async (req: Request, res: Response) => {
    const { occasion } = req.body;
    const generatedIdeas = await createIdeas({ occasion });
    res.send(generatedIdeas);
});

This produces a nicely formatted json response to the frontend that is very easy to pass into the appropriate UI components.

{
    "ideas": [
        "Example first idea \n\n",
        ...
    ]
}

The Challenge: Latency

The hiccup? A waiting time of up to 30 seconds before users can view the AI's suggestions. Watching a loading icon spin for half a minute is not a good user experience.

Tip #1: Leverage Streaming

Enter OpenAI's “streaming” feature - a savior for reducing latency. By setting OpenAI's Node SDK input parameter stream to true we display words to the user as they became available. This doesn't expedite the complete generation process, but it cuts down the wait time for the first word. Think of it as the “typewriter” effect seen in ChatGPT.

To peek under the hood, the streaming feature uses an HTML5 capability called Server Sent Events (SSE). SSE allows servers to push real-time data to web clients over a single HTTP connection. Unlike WebSockets, which is bidirectional, SSE is unidirectional, making it perfect for sending data from the server to the client in scenarios where the client doesn't need to send data back.

So, we refactor the request to OpenAI to include the input parameter stream and we return a Stream wrapped in an API Promise to our controller method.

export const createIdeas = async ({ occasion }: { occasion: string; }) => {
    return openai.chat.completions.create({
        model: chatModel,
        stream: true,
        messages: [
            {
                role: "user",
                content: `I want to host an event for the following occasion: ${occasion}. Write me a list of 4 separate ideas for this event`,
            },
        ],
    });
};

After understanding the mechanics, our next step was to synchronize the frontend client that communicates with the backend Express endpoint above with the frontend to support SSE.

export const requestIdeas = async (occasion: string) => {
    const res = await fetch(`${process.env.REACT_APP_API_BASE_URL}/generate-ideas`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({ occasion })
    });
    return res.body?.pipeThrough(new TextDecoderStream()).getReader();
};

The Impact

The results are night and day. From a staggering 30-second wait, users now see the initial AI-generated content within half a second.

Tip #2: UI Components of Streaming

When users are given multiple options to choose from, the UI should split those different options into different components. For example, each option should be a radio button or a different div. But streaming text in real-time throws a wrench in the works. How can we differentiate, parse each AI-generated suggestion as distinct UI components?

The Solution: Add parsing based on a unique character.

Add a unique Bullet Identifier: Update your prompt to the AI to use an unusual character as a bullet point that is unlikely to appear in the rest of your text. We used the “¶” symbol and updated our prompt to include the following: `Start each bullet point with a new line and the ¶ character. Do not include the normal bullet point character, the '-' character, or list numbers.``
Splitting the Stream: We segmented each byte array from the SSE endpoint into distinct words. This separation was pivotal, given that a single SSE byte array content was unpredictable. Sometimes it included a single word, other times it included full phrases that contained the “¶” character, like subject matter. ¶ Engage.
Append each word: Once each word is prepared, we append the value to the appropriate UI component. Tracking the “¶” occurrences helps us assign words to the correct component. For instance, a single “¶” means the content belonged to the first-option component. Repeating this process in the loop until the SSE endpoint closed.

export const parseResponseValue = (index: number, value: string, setters: Function[]) => {
    // Separate into individual words only (no phrases)
    const splitValue = value.split(/(?! {2})/g);

    for (let word of splitValue) {
        if (word.includes('¶')) {
            index++;
        } else {
            setters[index]((prev: string) => {
                return prev + word;
            });
        }
    }
    // Return the index to the calling function for use for the next byte array received from the endpoint. 
    return index;
};

export const requestIdeas = async (occasion: string) => {
    const res = await fetch(`${process.env.REACT_APP_API_BASE_URL}/generate-ideas`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({ occasion })
    });
    return res.body?.pipeThrough(new TextDecoderStream()).getReader();
};

Though string parsing occasionally fails due to edge cases from the AI, it facilitates an overall better user experience by stylizing real-time text streaming. On encountering an AI anomaly, equip users with the ability to retry AI generation. Generally, this fixes any parsing issue encountered the first time.

Importantly, by avoiding multiple smaller requests, we economized on tokens sent to GPT-4. This not only curtailed costs but also enriched the result quality.

Conclusion

Harnessing the power of generative AI in applications is undeniably transformative, but it doesn't come without its challenges. As we've explored, latency can be a significant hurdle, potentially hampering user experience. However, with innovative solutions like real-time streaming and strategic UI component parsing, we can overcome these challenges, making our applications not only more responsive but also user-friendly.