Build a Chat Room for Ollama with FastAPI: A Step-by-Step Guide to a Visual Interface!
Tired of "awkward chatting" with your Ollama model in the command line? Want to give it a more intuitive and cool chat interface? Good news! With Python's FastAPI framework and a bit of frontend magic, we can easily make it happen!
This guide will take you step-by-step, from scratch, to build a FastAPI-based visual chat application for Ollama. We'll use WebSocket technology to achieve real-time, typing-like chat effects. Ready? Let's build something cool!
Step 1: Blueprint Planning - Let's See What Our "Construction Site" Looks Like
Before we start, let's familiarize ourselves with the basic structure of the project. The code is all in the fastapi_chat_app
folder of the handy-ollama
repository.
fastapi_chat_app/
├── app.py # The "brain" of the FastAPI application, handling requests and logic
├── websocket_handler.py # (If logic is complex, it can be placed here separately, this example simplifies it in app.py) - Handles real-time chat messages
├── static/ # The place to store static files
│ └── index.html # The "face" of our chat interface (HTML)
└── requirements.txt # The list of "building materials" needed for the project (Python libraries)
app.py
: The core of the FastAPI application, responsible for starting the service and defining access paths (such as the WebSocket connection for the chat window).- (Simplified in this example)
websocket_handler.py
: (If the chat logic is complex, it will be placed here separately) Specifically handles WebSocket connections, responsible for receiving user messages, sending them to Ollama, and transmitting Ollama's responses back to the frontend in real-time. static/index.html
: The chat window page that users see.requirements.txt
: Lists all the Python libraries that our project needs to install.
Step 2: Preparations - Get the Code and "Building Materials"
"Move" the code home (clone the repository): Open your terminal (command line) and run this command to download the code to your computer:
bashgit clone https://github.com/AXYZdong/handy-ollama
Enter the "construction site" and install "building materials" (dependencies):
bashcd handy-ollama/fastapi_chat_app pip install -r requirements.txt
The second command will automatically read the
requirements.txt
file and help you install everything you need, such as FastAPI and the Ollama Python library.Important prerequisite: Before proceeding to the next step, please ensure that your Ollama service is already running locally! The FastAPI application needs to connect to it to work.
Step 3: Core Revelation - How WebSocket Makes Chat "Real-Time"
The core of our magic is in the app.py
file, which uses WebSocket technology to achieve real-time bidirectional communication between the browser and the backend. Let's take a look at the key code:
import ollama # Used to communicate with the Ollama model
from fastapi import FastAPI, WebSocket # FastAPI framework and WebSocket support
from fastapi.staticfiles import StaticFiles # Used to provide static files (our HTML page)
import os # Used to handle file paths
app = FastAPI()
# --- Set up the static file directory ---
# Let FastAPI know that it can provide files from the 'static' folder
# __file__ is the current script path, os.path.dirname gets the directory name
# os.path.join joins paths to ensure cross-platform compatibility
static_dir = os.path.join(os.path.dirname(__file__), "static")
app.mount("/static", StaticFiles(directory=static_dir), name="static")
@app.get("/") # Add a root path to make it easier to directly access the chat page
async def read_root():
# You can redirect to the chat page, or provide a simple welcome message
# Here we directly return the HTML file content (Jinja2 template engine is more recommended, but return directly for simplicity)
# An easier way is to let users directly access /static/index.html
from fastapi.responses import FileResponse
return FileResponse(os.path.join(static_dir, 'index.html'))
# --- WebSocket Chat Core Logic ---
@app.websocket("/ws") # Define the WebSocket connection path as /ws
async def websocket_endpoint(websocket: WebSocket):
print("Client is trying to connect to WebSocket...")
await websocket.accept() # Accept the WebSocket connection request from the browser
print("WebSocket connection established!")
try:
# Enter a loop to continuously wait for and process messages from the client
while True:
user_input = await websocket.receive_text() # Wait to receive the text message sent by the user
print(f"Received user message: {user_input}")
# Call the Ollama API for streaming chat
stream = ollama.chat(
model='llama3.1', # The Ollama model you want to use, make sure it exists locally
messages=[{'role': 'user', 'content': user_input}],
stream=True # Key! Enable streaming response
)
print("Getting streaming response from Ollama...")
# Receive Ollama's streaming reply and send it back to the frontend chunk by chunk
try:
full_response = "" # You can choose to accumulate the full response and then print it, or print it directly in streaming mode
for chunk in stream:
model_output_chunk = chunk['message']['content']
full_response += model_output_chunk
# Send this small chunk of reply back to the browser immediately via WebSocket
await websocket.send_text(model_output_chunk)
# print(f"Sending chunk: {model_output_chunk}") # You can see the content being sent in the background
print(f"Ollama reply completed: {full_response[:100]}...") # Print part of the full reply to the background
except Exception as e:
error_msg = f"Error interacting with Ollama: {e}"
print(error_msg)
await websocket.send_text(error_msg) # Send the error message to the frontend as well
# You can decide whether to disconnect here break
except Exception as e:
# Handle errors that may occur in the WebSocket connection itself, such as the client disconnecting suddenly
print(f"WebSocket connection error: {e}")
finally:
# In any case, try to close the connection at the end
print("Closing WebSocket connection...")
await websocket.close()
print("WebSocket connection closed.")
# --- (Optional) Add startup command instructions for easy running ---
if __name__ == "__main__":
import uvicorn
print("Starting FastAPI application, visit http://localhost:8000 or http://localhost:8000/static/index.html")
# Allow access from all sources (convenient during development, stricter settings are required in production)
uvicorn.run(app, host="0.0.0.0", port=8000)
Code Highlights Explained:
import
: Imports the FastAPI, WebSocket, and Ollama libraries.app = FastAPI()
: Creates a FastAPI application instance.app.mount("/static", ...)
: Tells FastAPI to serve the content in thestatic
folder (such asindex.html
) as static files.@app.get("/")
: Adds a root path handler, making it easy for users to directly visithttp://localhost:8000
to see the chat page (although directly accessing/static/index.html
also works).@app.websocket("/ws")
: This is where the magic happens! It defines a function to handle WebSocket connections. The JavaScript in the browser will connect to this/ws
path.await websocket.accept()
: Handshake successful, the browser and server establish a persistent WebSocket connection.while True
andawait websocket.receive_text()
: The server continuously listens, waiting for the user to enter and send messages in the browser.ollama.chat(..., stream=True)
: Sends the user message to the Ollama model and tells it that we want to receive the reply in streaming mode.for chunk in stream
: Iterates through the small chunks of replies returned by Ollama.await websocket.send_text(model_output_chunk)
: Core! Sends each small chunk of the reply that is obtained immediately back to the browser via WebSocket. The frontend JavaScript can then update the chat interface in real-time, achieving the typing effect.try...except...finally
: Robustness handling to ensure that the program can be handled properly in case of errors or disconnections.
Simply put, it is: browser sends a message -> FastAPI receives -> forwards to Ollama -> Ollama replies in streaming mode -> FastAPI receives chunk by chunk -> sends back to the browser chunk by chunk -> browser displays in real-time.
Step 4: Fire It Up! Get the Application Running
Make sure you are in the correct directory: Your terminal should still be in the
handy-ollama/fastapi_chat_app
directory.Start the FastAPI application:
bashuvicorn app:app --reload --host 0.0.0.0 --port 8000
uvicorn
is a high-performance ASGI server, specifically used to run asynchronous applications like FastAPI.app:app
refers to running the FastAPI instance namedapp
in theapp.py
file.--reload
is a great helper during development. When you modify and save the code, it will automatically restart the service without having to manually stop and restart it.--host 0.0.0.0
allows other devices on the local network to access your application (if needed). Use127.0.0.1
or omit it for access only on the local machine.--port 8000
specifies the port the service listens on. You can change it to another one as long as it is not occupied.
Open your browser: Enter
http://localhost:8000
(or the IP and port you specified when starting) in the browser's address bar. If everything goes well, you should see that simple chat interface!Start chatting! Enter your question in the input box and press Enter or click Send. Witness the moment of miracle: Ollama's reply will appear in the chat window in real-time, like typing!
See the effect:
- The front-end interface probably looks like this:
- Your terminal (background) will print logs like this: