Skip to content

Want to experience the magic of large language models but struggling with insufficient performance on your local computer? Typically, we deploy models locally using tools like ollama, but limited by computer resources, we can often only run smaller models like 1.5b (1.5 billion), 7b (7 billion), 14b (14 billion) parameters. Deploying a 70 billion parameter model is a huge challenge for local hardware.

Now, you can leverage Cloudflare's Workers AI to deploy a 70b model online and access it from the public network. Its interface is compatible with OpenAI, meaning you can use it like you would use the OpenAI API. The only downside is the limited daily free tier; exceeding this will incur charges. If you're interested, give it a try!

Preparation: Log in to Cloudflare and Bind a Domain

If you don't have your own domain yet, Cloudflare offers a free account domain. However, note that this free domain may not be directly accessible in some regions, and you might need to use some "magic" to access it.

First, open the Cloudflare website (https://dash.cloudflare.com) and log in to your account.

Step 1: Create a Workers AI

  1. Find Workers AI: In the Cloudflare control panel, find "AI" -> "Workers AI" in the left navigation bar, then click "Create from Worker template".

    image.png

  2. Create Worker: Next, click "Create Worker".

    image.png

  3. Enter Worker Name: Enter a string of English letters; this string will serve as the default account domain for your Worker.

image.png

image.png

  1. Deploy: Click the "Deploy" button in the bottom right corner to finish creating the Worker.

Step 2: Modify the Code and Deploy the Llama 3.3 70b Large Language Model

  1. Enter Code Editor: After deployment, you will see the interface shown in the image below. Click "Edit code".

    image.png

  2. Clear Code: Delete all the preset code in the editor.

    image.png

  3. Paste Code: Copy and paste the following code into the code editor:

    Here we are using the llama-3.3-70b-instruct-fp8-fast model, which has 70 billion parameters.

    You can also find other models to replace it on the Cloudflare Models page, such as the Deepseek open-source model. However, the llama-3.3-70b-instruct-fp8-fast is currently one of the largest and most effective models. image.png

    javascript
    const API_KEY='123456';
    export default {
      async fetch(request, env) {
    
        let url = new URL(request.url);
        const path = url.pathname;
    
        const authHeader = request.headers.get("authorization") || request.headers.get("x-api-key");
        const apiKey = authHeader?.startsWith("Bearer ")  ? authHeader.slice(7)  : null;
                            
        if (API_KEY && apiKey !== API_KEY) {
    
          return new Response(JSON.stringify({
            error: {
                message: "Invalid API key. Use 'Authorization: Bearer your-api-key' header",
                type: "invalid_request_error",
                param: null,
                code: "invalid_api_key"
            }
          }), {
              status: 401,
              headers: {
                  "Content-Type": "application/json",
              }
          });
        }
    
        if (path === "/v1/chat/completions") {
          const requestBody = await request.json();
           // messages - chat style input
    	  const {message}=requestBody
    	  let chat = {
    		messages: message
    	  };
          let response = await env.AI.run('@cf/meta/llama-3.3-70b-instruct-fp8-fast', requestBody);
        
          let resdata={
            choices:[{"message":{"content":response.response}}]
          }    
          return Response.json(resdata);
        }  
       
      }
    };
  4. Deploy Code: After pasting the code, click the "Deploy" button.

image.png

Step 3: Bind a Custom Domain

  1. Return to Settings: Click the return button on the left to go back to the Worker's management page, find "Settings" -> "Domains & Routes".

image.png

  1. Add Custom Domain: Click "Add Domain", then select "Custom Domain" and enter the subdomain you have already bound to Cloudflare.

image.png

Step 4: Use in OpenAI-Compatible Tools

After adding a custom domain, you can use this large language model in any tool that is compatible with the OpenAI API.

  • API Key: Is the API_KEY you set in the code, which defaults to 123456.
  • API Address: https://your-custom-domain/v1

Thanks to Cloudflare's powerful GPU resources, the experience is very smooth.

Precautions

image.png