The CEO of generative AI company Anthropic said that the Chinese models that caused tech stocks to crash were not developed as cheaply as commonly thought.
Dario Amodei admitted that DeepSeek's -V3 model was cheaper than leading US AI models, but said that it was not something that "fundamentally changes the economics of LLMs."
In a lengthy blog post, Amodei said that DeepSeek does not "do for $6 million what cost US AI companies billions."
In its research paper, DeepSeek said that the training run on the model cost the equivalent of $5.6m on rented GPUs. This is imprecise, as the company did not rent the servers, nor did it only run this one training run.
Amodei said that the Claude 3.5 Sonnet mid-sized model cost a few tens of millions of dollars some nine to twelve months ago. "I think a fair statement is 'DeepSeek produced a model close to the performance of US models seven-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested).'"
With new models of similar performance already decreasing in cost as time goes on, Amodei argued that DeepSeek's release was not out of the norm. He claimed that the "historical trend of the cost curve decrease is ~4x per year."
He continued: "We’d expect a model 3-4x cheaper than 3.5 Sonnet/GPT-4o around now. Since DeepSeek-V3 is worse than those US frontier models - let’s say by ~2x on the scaling curve, which I think is quite generous to DeepSeek-V3 - that means it would be totally normal, totally 'on trend,' if DeepSeek-V3 training cost ~8x less than the current US models developed a year ago."
All of this, he contested, meant that DeepSeek is "not a unique breakthrough or something that fundamentally changes the economics of LLMs; it’s an expected point on an ongoing cost reduction curve."
The difference, and the reason that DeepSeek caused a market rout, is that DeepSeek is Chinese, Amodei said.
It's unclear if DeepSeek skirted export controls to build out an H100 GPU cluster, as some have suggested. However, the company is believed to use H800s (now banned), and H20s (not banned) as part of its compute infrastructure.
"It appears that a substantial fraction of DeepSeek's AI chip fleet consists of chips that haven't been banned (but should be); chips that were shipped before they were banned; and some that seem very likely to have been smuggled."
He called for further tightening of export controls to prevent China getting more chips. "I don't see DeepSeek themselves as adversaries and the point isn't to target them in particular," Amodei claimed, adding that his concern was with authoritarian governments, and that beating China was "existentially important."
Amodei, whose company is seeking to use gigawatt data centers, said that the latest market wobble would not impact the mega buildouts.
"To the extent that US labs haven't already discovered them, the efficiency innovations DeepSeek developed will soon be applied by both US and Chinese labs to train multi-billion dollar models," Amodei said.
"These will perform better than the multi-billion models they were previously planning to train — but they'll still spend multi-billions. That number will continue going up, until we reach AI that is smarter than almost all humans at almost all things."