Deploying 70B Large Language Models Online with Cloudflare Workers AI | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Want to experience the power of large language models but struggling with insufficient local computer performance? Usually, we deploy models locally using tools like ollama, but limited by computer resources, we can often only run smaller models like 1.5b (1.5 billion), 7b (7 billion), or 14b (14 billion). Deploying a 70 billion parameter large model is a huge challenge for local hardware.

Now, you can deploy large language models like 70B online using Cloudflare's Workers AI and access them from the public internet. Its interface is compatible with OpenAI, meaning you can use it just like you would use the OpenAI API. The only downside is the limited daily free quota, with charges incurred for exceeding it. If you're interested, give it a try!

Preparation: Log in to Cloudflare and Bind a Domain

If you don't have your own domain yet, Cloudflare will provide a free account domain. However, please note that this free domain may not be directly accessible in some regions, and you may need to use some "magic" to access it.

First, open the Cloudflare website (https://dash.cloudflare.com) and log in to your account.

Step 1: Create a Workers AI

Find Workers AI: In the left navigation bar of the Cloudflare control panel, find "AI" -> "Workers AI" and then click "Create from Worker template".
Create Worker: Then click "Create Worker".
Enter Worker Name: Enter a string of English letters, which will serve as the default account domain for your Worker.

Deploy: Click the "Deploy" button in the lower right corner to complete the creation of the Worker.

Step 2: Modify Code and Deploy the Llama 3.3 70b Large Model

Enter Code Editing: After deployment, you will see the interface shown below. Click "Edit code".
Clear Code: Delete all the preset code in the editor.
Paste Code: Copy and paste the following code into the code editor:
Here we are using the llama-3.3-70b-instruct-fp8-fast model, which has 70 billion parameters.
You can also find other models on the Cloudflare Models page to replace it, such as the Deepseek open-source model. However, the llama-3.3-70b-instruct-fp8-fast is currently one of the largest and most effective models.

```javascript
const API_KEY='123456';
export default {
  async fetch(request, env) {

    let url = new URL(request.url);
    const path = url.pathname;

    const authHeader = request.headers.get("authorization") || request.headers.get("x-api-key");
    const apiKey = authHeader?.startsWith("Bearer ")  ? authHeader.slice(7)  : null;
                        
    if (API_KEY && apiKey !== API_KEY) {

      return new Response(JSON.stringify({
        error: {
            message: "Invalid API key. Use 'Authorization: Bearer your-api-key' header",
            type: "invalid_request_error",
            param: null,
            code: "invalid_api_key"
        }
      }), {
          status: 401,
          headers: {
              "Content-Type": "application/json",
          }
      });
    }

    if (path === "/v1/chat/completions") {
      const requestBody = await request.json();
       // messages - chat style input
	  const {message}=requestBody
	  let chat = {
		messages: message
	  };
      let response = await env.AI.run('@cf/meta/llama-3.3-70b-instruct-fp8-fast', requestBody);
    
      let resdata={
        choices:[{"message":{"content":response.response}}]
      }    
      return Response.json(resdata);
    }  
   
  }
};
```

Deploy Code: After pasting the code, click the "Deploy" button.

Step 3: Bind a Custom Domain

Return to Settings: Click the back button on the left to return to the Worker management page, find "Settings" -> "Domains & Routes".

Add Custom Domain: Click "Add domain", then select "Custom domain" and enter the subdomain you have already bound to Cloudflare.

Step 4: Use in Tools Compatible with OpenAI

After adding a custom domain, you can use this large model in any tool that is compatible with the OpenAI API.

API Key: The API_KEY you set in the code, which defaults to 123456.
API Address: https://your-custom-domain/v1

Thanks to Cloudflare's powerful GPU resources, it's very smooth to use.

Precautions

Free Quota: Cloudflare Workers AI provides 10k free token usage per day, and charges will be incurred for exceeding this.
Pricing Details: You can view detailed pricing information on the Cloudflare official pricing page (https://developers.cloudflare.com/workers-ai/platform/pricing/).

Preparation: Log in to Cloudflare and Bind a Domain ​

Step 1: Create a Workers AI ​

Step 2: Modify Code and Deploy the Llama 3.3 70b Large Model ​

Step 3: Bind a Custom Domain ​

Step 4: Use in Tools Compatible with OpenAI ​

Precautions ​