Skip to content

Ollama GPU Acceleration Configuration Guide (Windows & Linux): Make AI Inference Lightning Fast!

Feeling like Ollama is a bit sluggish when running models on CPU? Want to experience AI speed with near-instant responses? Great news! If your computer has a decent NVIDIA graphics card (GPU), it's your secret weapon! GPUs are naturally skilled at parallel processing, so letting them handle AI inference can leave CPUs in the dust!

This guide will walk you through unlocking Ollama's GPU acceleration capabilities, whether you're a Windows enthusiast or a Linux devotee.

Why use GPU acceleration? Simply put:

  • Fast! Fast! Fast! Significantly reduce waiting times, with model response speed increasing noticeably.
  • Smoother Experience: Real-time chat, code generation? GPUs make interactions smoother, eliminating lag.
  • Handle Larger Models: GPUs have more "memory" (VRAM) to run more complex, powerful models.

Preparations Before You Start (Please Confirm!)

Before we begin, make sure your "equipment" meets the requirements:

  1. You have an NVIDIA graphics card: This is essential! AMD or Intel integrated graphics are not supported at the moment.
  2. You have installed the latest NVIDIA drivers: This is key to enabling communication between the system and the graphics card. How to check?
    • Open Command Prompt (Windows) or Terminal (Linux).
    • Enter nvidia-smi and press Enter.
      • Success: If you see a bunch of information about your graphics card model, driver version, temperature, VRAM usage, etc., congratulations, the driver is fine!
      • Failure: If you see "not an internal or external command" or a similar error, you need to download and install the latest driver from the NVIDIA website.
  3. CUDA Toolkit? (Usually, you don't need to worry): CUDA is NVIDIA's platform that allows programs to utilize GPU computing. Fortunately, Ollama is usually smart enough to handle these dependencies on its own. You can use nvcc --version to check (same method as above).
    • If you see the CUDA version number, you have installed it.
    • If you see "command not found", don't rush to install it! Ollama probably doesn't need you to install it manually. Only consider manually installing the CUDA Toolkit if you find that GPU acceleration still doesn't work after configuring it.
  4. (Optional) Know your GPU's "ID card" (UUID): If you have more than one NVIDIA graphics card and want to specify which one Ollama should use, you need to know its UUID (unique identifier).
    • Still in the Command Prompt/Terminal, enter nvidia-smi -L and press Enter.
    • You will see information like GPU 0: NVIDIA GeForce RTX 3080 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx). Write down the long string starting with GPU-. This is the UUID of the card you want to specify.

Ready? Let's get started, system by system!

Windows Systems: Let Your GPU Shine!

Step 1: Set the "Magic Switch" Environment Variable

We need to tell Ollama to "use the GPU for work" by setting an "environment variable."

  1. Find the "Environment Variables" settings:

    • The fastest way: Press the Win key, search for "environment variables", and select "Edit the system environment variables".
    • In the "System Properties" window that pops up, click the "Environment Variables(N)..." button at the bottom right.
  2. Create the OLLAMA_GPU_LAYER variable:

    • In the "System variables(S)" area below (this applies to all users), click "New(W)...".
    • Fill in the following information:
      • Variable name: OLLAMA_GPU_LAYER
      • Variable value: cuda (lowercase, this tells Ollama to use the CUDA path to enable the GPU)
    • Click "OK" to save.

    Small Tip: System Variables vs. User Variables? System variables apply to all users on the computer, while user variables only apply to the currently logged-in user. It is generally recommended to set it as a system variable. If the same variable name is set in both places, the user variable takes precedence.

  3. (Optional) If you have multiple GPUs, specify which one to use:

    • Remember the UUID we found earlier? It's time to use it!
    • Still in the "System variables" area, create another new variable:
      • Variable name: CUDA_VISIBLE_DEVICES
      • Variable value: Paste the GPU UUID you noted down earlier, such as GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    • Click "OK" to save.

    Why use UUID instead of a number (0, 1, 2...)? Strongly recommended to use UUID! Because the GPU's number (like 0, 1 displayed in nvidia-smi) may change due to driver updates, reboots, or even hardware removal, causing your settings to become invalid. UUID is the graphics card's unique "ID card", stable and reliable!

Step 2: Restart! Restart! Restart! (Important Things Three Times)

  • Super Critical: In order for Windows and Ollama to fully recognize and apply the environment variables you just set, strongly recommend restarting your computer! Although sometimes restarting only Ollama or the terminal may work, restarting the computer is the safest and most thorough way to ensure everything is applied correctly.

Step 3: Verify that the GPU Has Been "Requisitioned"

  1. Run an Ollama Model:
    • Open a new Command Prompt window.
    • Run a model, for example: ollama run llama3 (or your commonly used model). Let it run.
  2. Summon nvidia-smi to monitor:
    • Open another Command Prompt window.

    • Enter nvidia-smi and press Enter.

    • Observe the output:

      • Do you see the GPU Utilization (GPU-Util) is no longer 0%?
      • Do you see that the Memory-Usage has increased significantly?
      • In the Processes section below, can you see an ollama or similar process using GPU memory?
      • If all the above points are met, then congratulations! Your Ollama has successfully used GPU acceleration! Enjoy the speed boost!
    • You can also try running ollama ps. While it doesn't directly display GPU information, confirming that the Ollama process is running is a prerequisite. Rely on nvidia-smi to confirm if the GPU is working.

Linux Systems: Unleash the Penguin's GPU Potential!

On Linux, if Ollama is running through a systemd service (as is the default for many installation methods), we need to modify the service configuration file to specify the GPU. Manually editing is prone to errors, so we'll use a small script to help.

Step 1: Use a Script to Smartly Configure the GPU

This script will help you automatically find Ollama's systemd service file and safely add or modify the CUDA_VISIBLE_DEVICES setting.

  1. Create a script file:

    • Open a terminal and use your favorite editor (such as nano or vim) to create a file named ollama_gpu_selector.sh:
      bash
      nano ollama_gpu_selector.sh
  2. Copy and paste the script code:

    • Copy and paste the entire following code into the editor:
    bash
    #!/bin/bash
    
    # --- Ollama GPU Selector Script ---
    # Purpose: Safely sets CUDA_VISIBLE_DEVICES for the Ollama systemd service.
    
    OLLAMA_SERVICE_FILE="/etc/systemd/system/ollama.service"
    
    # Function to validate input (must be comma-separated numbers, e.g., 0 or 0,1)
    validate_input(){
      local input="$1"
      # Basic check: allow digits, commas, and ensure it doesn't start/end with a comma or have consecutive commas
      if [[ ! "$input" =~ ^[0-9]+(,[0-9]+)*$ ]]; then
        echo "Error: Invalid input '$input'. Please enter GPU indices (like 0, or 0,1) separated by commas, no spaces."
        exit 1
      fi
      # You could add more checks here, e.g., verify indices against nvidia-smi -L output if needed
    }
    
    # Function to update the systemd service file
    update_service(){
      local devices="$1"
      local env_line="Environment=\"CUDA_VISIBLE_DEVICES=$devices\""
    
      echo "Attempting to update $OLLAMA_SERVICE_FILE..."
      echo "Setting CUDA_VISIBLE_DEVICES to: $devices"
    
      # Check if the service file exists
      if [ ! -f "$OLLAMA_SERVICE_FILE" ]; then
        echo "Error: Ollama service file not found at $OLLAMA_SERVICE_FILE"
        echo "Please ensure Ollama is installed correctly as a systemd service."
        exit 1
      fi
    
      # Use sudo tee to write the changes with proper permissions
      # Try to replace the line if it exists, otherwise add it under [Service]
      if grep -q '^Environment="CUDA_VISIBLE_DEVICES=' "$OLLAMA_SERVICE_FILE"; then
          echo "Updating existing CUDA_VISIBLE_DEVICES line..."
          sudo sed -i 's|^Environment="CUDA_VISIBLE_DEVICES=.*|'"$env_line"'|' "$OLLAMA_SERVICE_FILE"
      else
          echo "Adding new CUDA_VISIBLE_DEVICES line under [Service]..."
          # Insert the line after the [Service] tag
          sudo sed -i '/\[Service\]/a '"$env_line"'' "$OLLAMA_SERVICE_FILE"
      fi
    
      if [ $? -ne 0 ]; then
        echo "Error: Failed to modify $OLLAMA_SERVICE_FILE. Check permissions or file content."
        exit 1
      fi
    
      echo "Reloading systemd daemon and restarting Ollama service..."
      sudo systemctl daemon-reload
      sudo systemctl restart ollama.service
    
      if [ $? -eq 0 ]; then
          echo "Successfully updated and restarted Ollama service."
          echo "Run 'systemctl status ollama.service' and 'nvidia-smi' while running a model to verify."
      else
          echo "Warning: Ollama service might not have restarted correctly. Check 'systemctl status ollama.service'."
      fi
    }
    
    # --- Main script logic ---
    # Display available GPUs using nvidia-smi -L for user reference
    echo "Available GPUs on this system:"
    nvidia-smi -L
    echo "-------------------------------------"
    
    # Check if arguments are passed
    if [ "$#" -eq 0 ]; then
      # Prompt user for CUDA_VISIBLE_DEVICES values if no arguments are passed
      read -p "Enter the GPU index/indices you want Ollama to use (e.g., 0 or 0,1): " cuda_values
      validate_input "$cuda_values"
      update_service "$cuda_values"
    else
      # Use the first argument as CUDA_VISIBLE_DEVICES values
      cuda_values="$1"
      echo "Using provided argument for GPU indices: $cuda_values"
      validate_input "$cuda_values"
      update_service "$cuda_values"
    fi
    
    exit 0
    • Save and exit the editor (nano: Ctrl+X, Y, Enter).

    What does this script do? It will first use nvidia-smi -L to display the available GPUs and their numbers, and then ask you which one (or ones, separated by commas) you want to use. Then, it will use sudo privileges to help you modify the /etc/systemd/system/ollama.service file, adding or updating the line Environment="CUDA_VISIBLE_DEVICES=the number you entered". Finally, it will automatically help you reload the configuration and restart the Ollama service. It's safer and more convenient than manually editing!

  3. Give the script execute permission:

    bash
    chmod +x ollama_gpu_selector.sh
  4. Run the script (requires sudo):

    bash
    sudo ./ollama_gpu_selector.sh
    • The script will first display your GPU list.
    • Then, it prompts you to enter the GPU index you want to use. Look at the output of nvidia-smi -L. It usually starts from 0. To use the first card, enter 0. To use the first and second cards simultaneously (if your cards support and the driver configuration allows multi-card parallelism), enter 0,1.
    • Press Enter, and the script will automatically complete the rest of the work.

Step 2: (Optional) Check the Configuration File

If you're not sure, you can check if the configuration file is really modified correctly:

bash
cat /etc/systemd/system/ollama.service

Look for something like Environment="CUDA_VISIBLE_DEVICES=0" (or the other number you entered) in the output.

Step 3: (Script Already Done) Restart the Ollama Service

The script above has already executed the command to restart the service at the end:

bash
sudo systemctl daemon-reload
sudo systemctl restart ollama.service

If you manually modified the configuration file, or want to confirm again, you can manually run these two commands.

Step 4: Verify that the GPU is Working

Same as Windows:

  1. Run an Ollama model: ollama run llama3
  2. In another terminal window, run nvidia-smi and observe the GPU utilization, VRAM usage, and process list. Seeing Ollama using the GPU means it's working correctly!

Now, your Ollama should have successfully switched to GPU acceleration mode! Try it out and feel the leap in AI inference speed!