Training and inference of AI large models sounds high-end, but it is essentially "fortune-telling" - except that it calculates data, not your love affair.
In the field of AI, GPUs (graphics chips) are more important than CPUs (central processing units), and even more importantly, only NVIDIA GPUs are effective, while Intel and AMD are far behind.
GPU vs CPU: One is a Group Fight, the Other is a Lone Wolf
Imagine that training an AI large model is like moving bricks.
A CPU is like an "all-rounder" who can do many things: computing, logic, and management, no matter how complex, they are all proficient, but it has a small number of cores, at most dozens. No matter how fast it moves bricks, it can only move a few or at most dozens at a time. It is inefficient to work hard.
What about GPU? It has a frightening number of cores, easily thousands or tens of thousands. Although each core can only move one brick, it can't stand the large number of people! Thousands or tens of thousands of younger brothers go together, and the bricks are moved away.
The core task of AI training and inference is "matrix operation" - simply put, it is a large pile of numbers queuing up to do addition, subtraction, multiplication, and division, just like a massive pile of red bricks waiting to be moved, a simple job that can be done without brains.
The "massive core parallelism" capability of GPU comes in handy, which can process thousands or tens of thousands of small tasks at the same time, which is tens or even hundreds of times faster than CPU.
What about CPU? It is more suitable for serial complex tasks, such as playing a stand-alone game, writing a document. There are too many bricks in AI, and it moves a few or dozens at a time, and it can't catch up with GPU even if it is exhausted.
Why does NVIDIA dominate? AMD and Intel are crying in the toilet
Okay, here comes the question: NVIDIA is not the only one with GPUs. AMD and Intel also have graphics cards. Why does the AI circle use NVIDIA's goods with a smile on their faces? The answer is simple and rude - NVIDIA not only sells hardware, but also "kidnaps" the entire ecosystem.
First, the software ecosystem is invincible. NVIDIA has a killer feature called CUDA (a programming platform), which is customized for its GPUs. AI engineers write code to train models, and using CUDA is like cheating, simple and efficient. AMD has its own ROCm, and Intel also has OneAPI, but these two are either not mature enough, or using them is like solving math problems. How can they be as easy to use as CUDA?
Second, first-mover advantage + market created by money. NVIDIA bet on AI early on and launched CUDA more than ten years ago, and forcibly cultivated AI researchers into "NVIDIA believers". What about AMD and Intel? By the time they reacted, NVIDIA had already occupied the AI territory tightly. Want to catch up now? Too late.
Third, the hardware is not bad either. NVIDIA's GPUs (such as A100, H100) are specially optimized for AI, with high memory bandwidth and explosive computing power. Although AMD and Intel's graphics cards are good for playing games, they are always a bit lacking in AI tasks. For example, NVIDIA is a "special excavator for AI bricklaying", while AMD and Intel are still "household shovels", and the efficiency is too different.
The Rich and Foolish AI Circle
Therefore, GPU wins CPU completely because of "many people are powerful", and NVIDIA's dominance is a combination of "hardware + software + foresight".
AMD and Intel are not without opportunities, but they have to work harder, otherwise they can only watch NVIDIA continue to count money until their hands cramp.
In the AI industry, burning money is a daily routine. Choosing NVIDIA's GPU is like buying a "cheat code", which is expensive, but you win at the starting line. Isn't it funny? Before AI saves the world, it saves NVIDIA's stock price first!