Streaming GPT-3.5-turbo Responses

Streaming GPT-3.5-turbo Responses

Streaming mode lets you receive a ChatGPT response token-by-token (or word-by-word) instead of waiting for the full answer. In other words, when you set “stream”: true in your API call, OpenAI will send pieces of the message as they are generated, much like ChatGPT’s typing effect. This can improve user experience on long responses, since the user sees the answer appear in real time rather than waiting. As one OpenAI developer notes, streaming returns the “completion back word-after-word (like ChatGPT)”, whereas non-streaming returns the whole response at once. To enable this, you simply include ‘stream’ => true in your request payload. The API then returns an ongoing HTTP stream where each JSON chunk contains part of the assistant’s reply. Your PHP script can read these chunks incrementally and immediately output them to the client.

 

Securely Storing Your API Key

Before coding, make sure your OpenAI API key is stored securely. Do not hard-code the key in your PHP scripts. A best practice is to use an environment variable or a separate config file. For example, you might put your key in your server’s environment as OPENAI_API_KEY (or in an .env file) and then in PHP use:

 

$apiKey = getenv('OPENAI_API_KEY');

 

This way the secret stays out of your source code and version control. A StackOverflow answer warns that embedding keys in code “is just asking for people to impersonate you” and recommends environment variables instead.

 

Writing the PHP Script to Call the API

With your key ready, create a plain PHP script (e.g. stream.php). You only need PHP and its built-in cURL extension – no frameworks or extra libraries. Below is an outline of the PHP code to make a streaming API call:

 

<?php
// 1. Load API key from environment
$apiKey = getenv('OPENAI_API_KEY');

// 2. Build the request data with 'stream' => true
$data = [
    'model' => 'gpt-3.5-turbo',
    'messages' => [
        ['role' => 'user', 'content' => 'Hello, how are you?']
    ],
    'stream' => true,    // enable streaming mode
    'max_tokens' => 150
];

// 3. Initialize cURL
$ch = curl_init('https://api.openai.com/v1/chat/completions');
curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'Content-Type: application/json',
    'Authorization: Bearer ' . $apiKey,
    // If you use Server-Sent Events (see below), you can also add:
    // 'Accept: text/event-stream'
]);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));

// 4. Tell cURL to return the transfer as a stream
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);

// 5. Define a write callback to handle each incoming chunk
curl_setopt($ch, CURLOPT_WRITEFUNCTION, function($curl, $chunk) {
    // This function is called by cURL as soon as a new piece of data is available.
    echo $chunk;              // output the chunk immediately
    ob_flush();               // flush PHP output buffers
    flush();                  // flush web server output buffers
    return strlen($chunk);    // must return the number of bytes processed
});

// 6. Execute the request
curl_exec($ch);

// 7. Handle errors and cleanup
if (curl_errno($ch)) {
    // In case of error, print it (you may also log it)
    echo 'API Error: ' . curl_error($ch);
}
curl_close($ch);
?>

 

Code breakdown: We set up a POST to the /v1/chat/completions endpoint with our JSON data, including ‘stream’ => true. We use CURLOPT_WRITEFUNCTION so that cURL calls our callback each time a chunk arrives. Inside that callback, we echo the chunk and immediately flush the output (ob_flush(); flush();) so it goes straight to the browser. This way, each chunk (usually containing a piece of the assistant’s choices[0].delta.content) is sent to the client as soon as it arrives. In effect, the response is processed incrementally rather than all at once.

 

Handling the Stream in PHP

In streaming mode, OpenAI sends a series of JSON objects, each starting with data: (if using SSE) or just raw JSON if not. The PHP callback above simply echoes whatever arrives. For a simple output, you might see pieces of JSON like:

 

data: {"id":"...","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}  
data: {"id":"...","choices":[{"delta":{"content":", how"},"index":0,"finish_reason":null}]}  
data: {"id":"...","choices":[{"delta":{"content":" are you?"},"index":0,"finish_reason":"stop"}]}  

 

Each echo $chunk will send that directly to the client. If you’re not using SSE on the client side (see below), you might need to remove the “data:” prefixes yourself. In any case, because we flush after each echo, the browser receives and can display the message piece by piece. This mimics a “typing” effect in the UI.

(Tip: Use set_time_limit(0); at the top if you expect long responses, to prevent PHP from timing out.)

 

Real-Time Display on the Web Page

To show the streamed content in the browser as it arrives, you have two common options: Server-Sent Events (SSE) with EventSource, or the Fetch API with Streams. Both let you append text to the page immediately.

 

Option 1: Server-Sent Events (EventSource)

Server-Sent Events is a one-way streaming protocol from server to client. To use it, have your PHP script output with an SSE-compatible header and format. For example, modify the PHP above to start with:

 

header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");

 

And in your write callback (or loop) prefix each chunk with data: and two newlines. For example:

 

curl_setopt($ch, CURLOPT_WRITEFUNCTION, function($curl, $chunk) {
    // Assume $chunk contains plain text or JSON without "data:" prefix.
    // Prepend "data:" so it forms a valid SSE message.
    echo "data: " . $chunk . "\n\n";
    ob_flush(); flush();
    return strlen($chunk);
});

 

This ensures the browser’s EventSource can read the stream. On the HTML/JS side, you would write:

<div id="chatOutput"></div>
<script>
// Create an EventSource listening to our PHP stream endpoint.
const source = new EventSource('stream.php?prompt=Hello');

// Append each received message to the page.
source.onmessage = function(event) {
    document.getElementById('chatOutput').innerText += event.data;
};
</script>

 

As chunks arrive, the onmessage handler fires with event.data containing the text. This uses the built-in EventSource mechanism and does not require external libraries. A PHP example of an SSE loop (outside of OpenAI) looks like:

<?php
header("Content-type: text/event-stream");
while (true) {
    echo "data: " . date("H:i:s") . "\n\n";
    ob_flush(); flush();
    sleep(1);
}
?>

 

which demonstrates sending data: lines to the client. By analogy, our OpenAI loop sends the AI’s content.

Option 2: Fetch with ReadableStream

If you prefer a more modern JS approach, you can use fetch() and read the response.body stream manually. For example:

<div id="chatOutput"></div>
<script>
async function streamOpenAI() {
    const response = await fetch('stream.php?prompt=Hello');
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let done = false;
    let textChunk;

    while (!done) {
        // Read the next data chunk from the stream
        const { value, done: streamDone } = await reader.read();
        done = streamDone;
        if (value) {
            // Decode and append the text to the page
            textChunk = decoder.decode(value, { stream: true });
            document.getElementById('chatOutput').innerText += textChunk;
        }
    }
}

streamOpenAI();
</script>

 

Here we call the same PHP URL. As each value arrives, we decode it and update the page. This uses the JavaScript Streams API so the UI can update piece-by-piece without waiting. It does not require explicit “data:” formatting on the server (though if your PHP sends SSE format, you’d strip the data: prefix in JS). Note that using reader.read() in a loop like this lets us process the incoming bytes as soon as they come, similar to how EventSource works.

 

Summary

In summary, to stream GPT-3.5-turbo responses in plain PHP:

  • Enable streaming: Include “stream”: true in the JSON payload.

  • Secure your key: Store the API key in an environment variable or external file.

  • Use cURL with a write callback: Set CURLOPT_WRITEFUNCTION to echo each chunk and flush immediately.

  • Output to the client: Either format the output as SSE (text/event-stream) and use EventSource in JS, or fetch the stream and read it with the Streams API.

  • Flush promptly: Call ob_flush() and flush() after each echo so data is sent right away.

By following these steps, your PHP app can display GPT-3.5’s answer in real time as it’s generated. The user will see the text appear gradually, creating a smooth, interactive chat experience.

Sources: Official OpenAI guidance and community examples have been adapted here to explain streaming mode blog.gopenai.com
stackoverflow.com, secure API key storagestackoverflow.com, and real-time streaming techniques (Server-Sent Events)
openfaas.com
stackoverflow.com
stack.convex.dev.

 

Previous Article

How to Add a Custom Section to WordPress Pages (Step-by-Step with Code)

Next Article

Build a Simple REST API in PHP – Full Guide with GET, POST, PUT & DELETE Methods

Write a Comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Subscribe to our Newsletter

Subscribe to our email newsletter to get the latest posts delivered right to your email.
Pure inspiration, zero spam ✨