Last week we created an LLM-based Chatbot with a knowledge base, which many people are interested in to my surprise. I thought it would be cool if we can continue on that topic, now let’s make the chatbot a bit more advanced with a sprinkle of WebSocket magic. This will demonstrate how easy it’s to make a useable LLM application using GChain.
The default experience that people expect from a chatbot today is with streaming response a la chatGPT, however, this is not a given experience. The fact that not all off-the-shelf models today support it, so far I only know openAI supports streaming response. Delivering the response to the user/client will also be more complicated as usual restful API is not sufficient, hence in this case we gonna use WebSocket to stream the response.
The plan is :
- Prepare a WebSocket server and a basic handler
- Make a conversation chain with a streaming model session
- Get user’s input and stream the response thru WebSocket
Prepare the WebSocket boilerplate
I hope WebSocket (ws) term here will not intimidate you, it will be simple and fun :). Ws boilerplate will not be much different than the usual HTTP server and handler, and here github.com/gorilla/websocket
will be used to make things much easier. Two big differences are 1.) The HTTP request will be upgraded to ws. 2.) As the connection will be long-lived, input and output will be streamed using WriteMessage and ReadMessage functions.
func main() {
// websocket route
http.HandleFunc("/chat", wshandler)
log.Println("http server started on :8000")
err := http.ListenAndServe(":8000", nil)
if err != nil {
log.Fatal("ListenAndServe: ", err)
}
fmt.Println("Program exited.")
}
func wshandler(w http.ResponseWriter, r *http.Request) {
// Upgrade initial GET request to a websocket
ws, err := upgrader.Upgrade(w, r, nil)
if err != nil {
log.Fatal(err)
}
// Make sure we close the connection when the function returns
defer ws.Close()
}
Make a conversation chain with a streaming model session
This part is where we GChain comes to help us. We’re creating a conversation chain within the ws session, so the conversation memory will be kept as long as the session is alive.
// all communication thru ws will be in this format
type message struct {
Text string `json:"text"`
Finished bool `json:"finished"`
}
func wshandler(w http.ResponseWriter, r *http.Request) {
log.Println("serving ws connection")
// new conversation memory created
memory := []model.ChatMessage{}
// streamingChannel to get response from model
streamingChannel := make(chan model.ChatMessage, 100)
// setup new conversation chain
convoChain := conversation.NewConversationChain(chatModel, memory, callback.NewManager(), "You're helpful chatbot that answer human question very concisely, response with formatted html.", false)
// append greeting to the memory
convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleAssistant, Content: "Hi, My name is GioAI"})
......
// send greetings to user
m, err := json.Marshal(message{Text: "Hi, My name is GioAI", Finished: true})
if err != nil {
log.Println(err)
return
}
ws.WriteMessage(websocket.TextMessage, m)
}
Handle Messages
With the WebSocket boilerplate and the conversation chain setup, we’re ready to get and send messages to the client. All messages sent will use the message
struct defined above. There are 4 main activities here: 1.) Get the user’s message from ws. 2.) Send the user’s message to the model. 3.) Get the model’s response. 4.) Stream the response to the user.
A main loop is created within wshandler
, we can see how a call to model is within a go routine as we don’t want the model call to be a blocking process. Later streamingChannel
is used to handle the model’s response.
func wshandler(w http.ResponseWriter, r *http.Request) {
.....
// main loop to handle user's input
for {
// Read in a new requestMessage as JSON and map it to a Message object
_, requestMessage, err := ws.ReadMessage()
if err != nil {
log.Printf("error: %v", err)
break
}
// whole model output will be kept here
var output string
// send request to model as a go routine, as we don't want to block here
go func() {
var err error
output, err = convoChain.SimpleRun(context.Background(), string(requestMessage[:]), model.WithIsStreaming(true), model.WithStreamingChannel(streamingChannel))
if err != nil {
fmt.Println("error " + err.Error())
return
}
}()
// handle the response streaming
for {
value, ok := <-streamingChannel
}
}
For every streamed response from the model, we will put it to message
struct to be marshalled as JSON and sent to the client. As we’re sending responses in small chunks, it’s not obvious for the client to identify the end of the message, hence the finished
field is important to have.
// the main loop
for {
.....
// handle the response streaming
for {
value, ok := <-streamingChannel
if ok && !model.IsStreamFinished(value) {
m, err := json.Marshal(message{Text: value.Content, Finished: false})
if err != nil {
log.Println(err)
continue
}
ws.WriteMessage(websocket.TextMessage, []byte(m))
} else {
// Finished field true at the end of the message
m, err := json.Marshal(message{Finished: true})
if err != nil {
log.Println(err)
continue
}
ws.WriteMessage(websocket.TextMessage, m)
break
}
}
// put user message and model response to conversation memory
convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleUser, Content: string(requestMessage[:])})
convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleAssistant, Content: output})
} // the end of main loop
The complete code is just above 100 lines, you can find it in the GChain’s example with a simple HTML client as well.