Skip to content

Build a Chat Room for Ollama with FastAPI: A Step-by-Step Guide to a Visual Interface!

Tired of "awkward chatting" with your Ollama model in the command line? Want to give it a more intuitive and cool chat interface? Good news! With Python's FastAPI framework and a bit of frontend magic, we can easily make it happen!

This guide will take you step-by-step, from scratch, to build a FastAPI-based visual chat application for Ollama. We'll use WebSocket technology to achieve real-time, typing-like chat effects. Ready? Let's build something cool!

Step 1: Blueprint Planning - Let's See What Our "Construction Site" Looks Like

Before we start, let's familiarize ourselves with the basic structure of the project. The code is all in the fastapi_chat_app folder of the handy-ollama repository.

fastapi_chat_app/
├── app.py             # The "brain" of the FastAPI application, handling requests and logic
├── websocket_handler.py # (If logic is complex, it can be placed here separately, this example simplifies it in app.py) - Handles real-time chat messages
├── static/            # The place to store static files
│   └── index.html     # The "face" of our chat interface (HTML)
└── requirements.txt   # The list of "building materials" needed for the project (Python libraries)
  • app.py: The core of the FastAPI application, responsible for starting the service and defining access paths (such as the WebSocket connection for the chat window).
  • (Simplified in this example) websocket_handler.py: (If the chat logic is complex, it will be placed here separately) Specifically handles WebSocket connections, responsible for receiving user messages, sending them to Ollama, and transmitting Ollama's responses back to the frontend in real-time.
  • static/index.html: The chat window page that users see.
  • requirements.txt: Lists all the Python libraries that our project needs to install.

Step 2: Preparations - Get the Code and "Building Materials"

  1. "Move" the code home (clone the repository): Open your terminal (command line) and run this command to download the code to your computer:

    bash
    git clone https://github.com/AXYZdong/handy-ollama
  2. Enter the "construction site" and install "building materials" (dependencies):

    bash
    cd handy-ollama/fastapi_chat_app
    pip install -r requirements.txt

    The second command will automatically read the requirements.txt file and help you install everything you need, such as FastAPI and the Ollama Python library.

    Important prerequisite: Before proceeding to the next step, please ensure that your Ollama service is already running locally! The FastAPI application needs to connect to it to work.

Step 3: Core Revelation - How WebSocket Makes Chat "Real-Time"

The core of our magic is in the app.py file, which uses WebSocket technology to achieve real-time bidirectional communication between the browser and the backend. Let's take a look at the key code:

python
import ollama # Used to communicate with the Ollama model
from fastapi import FastAPI, WebSocket # FastAPI framework and WebSocket support
from fastapi.staticfiles import StaticFiles # Used to provide static files (our HTML page)
import os # Used to handle file paths

app = FastAPI()

# --- Set up the static file directory ---
# Let FastAPI know that it can provide files from the 'static' folder
# __file__ is the current script path, os.path.dirname gets the directory name
# os.path.join joins paths to ensure cross-platform compatibility
static_dir = os.path.join(os.path.dirname(__file__), "static")
app.mount("/static", StaticFiles(directory=static_dir), name="static")

@app.get("/") # Add a root path to make it easier to directly access the chat page
async def read_root():
    # You can redirect to the chat page, or provide a simple welcome message
    # Here we directly return the HTML file content (Jinja2 template engine is more recommended, but return directly for simplicity)
    # An easier way is to let users directly access /static/index.html
    from fastapi.responses import FileResponse
    return FileResponse(os.path.join(static_dir, 'index.html'))


# --- WebSocket Chat Core Logic ---
@app.websocket("/ws") # Define the WebSocket connection path as /ws
async def websocket_endpoint(websocket: WebSocket):
    print("Client is trying to connect to WebSocket...")
    await websocket.accept()  # Accept the WebSocket connection request from the browser
    print("WebSocket connection established!")
    try:
        # Enter a loop to continuously wait for and process messages from the client
        while True:
            user_input = await websocket.receive_text()  # Wait to receive the text message sent by the user
            print(f"Received user message: {user_input}")

            # Call the Ollama API for streaming chat
            stream = ollama.chat(
                model='llama3.1',  # The Ollama model you want to use, make sure it exists locally
                messages=[{'role': 'user', 'content': user_input}],
                stream=True  # Key! Enable streaming response
            )

            print("Getting streaming response from Ollama...")
            # Receive Ollama's streaming reply and send it back to the frontend chunk by chunk
            try:
                full_response = "" # You can choose to accumulate the full response and then print it, or print it directly in streaming mode
                for chunk in stream:
                    model_output_chunk = chunk['message']['content']
                    full_response += model_output_chunk
                    # Send this small chunk of reply back to the browser immediately via WebSocket
                    await websocket.send_text(model_output_chunk)
                    # print(f"Sending chunk: {model_output_chunk}") # You can see the content being sent in the background
                print(f"Ollama reply completed: {full_response[:100]}...") # Print part of the full reply to the background
            except Exception as e:
                error_msg = f"Error interacting with Ollama: {e}"
                print(error_msg)
                await websocket.send_text(error_msg) # Send the error message to the frontend as well
                # You can decide whether to disconnect here break
    except Exception as e:
        # Handle errors that may occur in the WebSocket connection itself, such as the client disconnecting suddenly
        print(f"WebSocket connection error: {e}")
    finally:
        # In any case, try to close the connection at the end
        print("Closing WebSocket connection...")
        await websocket.close()
        print("WebSocket connection closed.")

# --- (Optional) Add startup command instructions for easy running ---
if __name__ == "__main__":
    import uvicorn
    print("Starting FastAPI application, visit http://localhost:8000 or http://localhost:8000/static/index.html")
    # Allow access from all sources (convenient during development, stricter settings are required in production)
    uvicorn.run(app, host="0.0.0.0", port=8000)

Code Highlights Explained:

  1. import: Imports the FastAPI, WebSocket, and Ollama libraries.
  2. app = FastAPI(): Creates a FastAPI application instance.
  3. app.mount("/static", ...): Tells FastAPI to serve the content in the static folder (such as index.html) as static files.
  4. @app.get("/"): Adds a root path handler, making it easy for users to directly visit http://localhost:8000 to see the chat page (although directly accessing /static/index.html also works).
  5. @app.websocket("/ws"): This is where the magic happens! It defines a function to handle WebSocket connections. The JavaScript in the browser will connect to this /ws path.
  6. await websocket.accept(): Handshake successful, the browser and server establish a persistent WebSocket connection.
  7. while True and await websocket.receive_text(): The server continuously listens, waiting for the user to enter and send messages in the browser.
  8. ollama.chat(..., stream=True): Sends the user message to the Ollama model and tells it that we want to receive the reply in streaming mode.
  9. for chunk in stream: Iterates through the small chunks of replies returned by Ollama.
  10. await websocket.send_text(model_output_chunk): Core! Sends each small chunk of the reply that is obtained immediately back to the browser via WebSocket. The frontend JavaScript can then update the chat interface in real-time, achieving the typing effect.
  11. try...except...finally: Robustness handling to ensure that the program can be handled properly in case of errors or disconnections.

Simply put, it is: browser sends a message -> FastAPI receives -> forwards to Ollama -> Ollama replies in streaming mode -> FastAPI receives chunk by chunk -> sends back to the browser chunk by chunk -> browser displays in real-time.

Step 4: Fire It Up! Get the Application Running

  1. Make sure you are in the correct directory: Your terminal should still be in the handy-ollama/fastapi_chat_app directory.

  2. Start the FastAPI application:

    bash
    uvicorn app:app --reload --host 0.0.0.0 --port 8000
    • uvicorn is a high-performance ASGI server, specifically used to run asynchronous applications like FastAPI.
    • app:app refers to running the FastAPI instance named app in the app.py file.
    • --reload is a great helper during development. When you modify and save the code, it will automatically restart the service without having to manually stop and restart it.
    • --host 0.0.0.0 allows other devices on the local network to access your application (if needed). Use 127.0.0.1 or omit it for access only on the local machine.
    • --port 8000 specifies the port the service listens on. You can change it to another one as long as it is not occupied.
  3. Open your browser: Enter http://localhost:8000 (or the IP and port you specified when starting) in the browser's address bar. If everything goes well, you should see that simple chat interface!

  4. Start chatting! Enter your question in the input box and press Enter or click Send. Witness the moment of miracle: Ollama's reply will appear in the chat window in real-time, like typing!

See the effect:

  • The front-end interface probably looks like this:Chat Interface Screenshot
  • Your terminal (background) will print logs like this:Background Log Screenshot