Implications on US Dominance, Nvidia and TSMC
What’s all the Fuss About?
Chinese AI startup Deepseek ignited a round of self-doubt on the US stock markets, sending AI-related stocks to a tailspin. While this pessimism may be overblown, it is a good time to take a deep dive on Deepseek.
Deepseek unveiled V3 in December and R1 in January. Now at the World Economic Forum (WEF) and all over the world, it is the hottest topic people are talking about.
Its app has skyrocketed to the top of the U.S. free app charts just a week after its launch. President Donald Trump called the Chinese company’s rapid rise “a wake-up call” for the U.S. tech industry, as its AI breakthrough sent shockwaves through Wall Street.
To train V3, DeepSeek managed with just 2,048 GPUs running for 57 days. The model’s training consumed 2.78 million GPU hours on Nvidia H800 chips – remarkably modest for a 671-billion-parameter model, employing a mixture-of-experts approach but it only activates 37 billion for each token.
The Breakthrough Thesis
In comparison, Meta needed approximately 30.8 million GPU hours – roughly 11 times more computing power – to train its Llama 3 model, which actually has fewer parameters at 405 billion.
Some other calculation shows OpenAI o1 costs $15 per million input tokens and $60 per million output tokens, DeepSeek Reasoner, which is based on the R1 model, costs$0.55 per million input and $2.19 per million output tokens.
Deepseek’s open-source reasoning model R1 is on par with the performance of OpenAI’s O1 in several tests. They built their model at the cost of US$5.6 million, which is only a fraction of the cost of OpenAI’s O1.
Some said DeepSeek-R1’s reasoning performance marks a big win for China, especially because the entire work is open-source, including how the company trained the model. Nevertheless, Deepseek’s extremely low cost and efficiency for training AI models are inviting investigations on how it is possible to spend only US$5.6 million to accomplish what others invested at least 10 times more and still outperform.
Deepseek shattered the impression that the impression that the US was way ahead of China, as it relates to AI, in large part because China does not have access to the most advanced NVIDIA GPUs. ScaleAI CEO Alexandr Wang told CNBC at the sideline of World Economic Forum (WEF) that Deepseek at least have 50,000 Nvidia H100 chips (though it has not been confirmed), which also has many people questioning the effectiveness of the export control.
Now, who is behind Deepseek? Why Deepseek is able to achieve such a great result? What are the possible factors that contributed to the success of Deepseek? How should we correctly interpret the implication of Deepseek’s success on the AI competition between the US and China?
Introduction to Journalist
Hi, I am Judy Lin, founder of TechSoda, a news platform that provides refreshing insights to the curious mind. Why soda? It is the acronym for “semiconductor”, “optics”, “digital”, and “AI”.
I am a senior journalist who covers the macroeconomic and foreign exchange market, banking/insurance/fintech, and technology business news in Taiwan for decades. My research interests in international business strategies and geopolitics led me to cover how industrial and trade policies impact the business of companies and how they should respond or take preemptive measures to navigate the uncertainty.
My studies in international business strategies and risk communications and network in the semiconductor and AI community here in Asia Pacific have been useful for analyzing technological trends and policy twists.
Seeing semiconductors become a strategic industry that many countries hold dear in their national security, I try to make my tech articles accessible to people who are not scientists or engineers but also would like to know more about the semiconductor supply chain.
Background of Deepseek, its founder and principal researcher
Founder
Deepseek was founded in July 2023 by Liang Wenfeng, a graduate of Zhejiang University’s Department of Electrical Engineering and a Master of Science in Communication Engineering, who founded the hedge fund “High-Flyer” with his business partners in 2015 and has quickly risen to become the first quantitative hedge fund in China to raise more than CNY100 billion.
He grew up in the 1980s in a fifth-tier municipality in Guangdong.
While most Chinese entrepreneurs like Liang, who have achieved financial freedom before reaching their forties, would have stayed in the comfort zone even if they hadn’t retired, Liang made a decision in 2023 to change his career from finance to research: he invested his fund’s resources in researching general artificial intelligence to build cutting-edge models for his own brand.
“High-Flyer does big models that are not directly related to quantization and finance, and we have established a new company called [Deepseek] to do this. What we want to do is general artificial intelligence, or AGI, and large language models may be a necessary path to AGI, and initially we have the characteristics of AGI, so we will start with large language models (LLM),” Liang said in an interview.
After DeepSeek launched its V2 model, it unintentionally triggered a price war in China’s AI industry. Founder Liang Wenfeng stated that their pricing was based on cost efficiency rather than a market disruption strategy. However, major players like ByteDance, Alibaba, and Tencent were forced to follow suit, leading to a pricing shift reminiscent of the internet subsidy era.
DeepSeek distinguishes itself by prioritizing AI research over immediate commercialization, focusing on foundational advancements rather than application development. Liang emphasizes that China must shift from imitating Western technology to original innovation, aiming to close gaps in model efficiency and capabilities. He believes open-sourcing and ecosystem-building are more sustainable than proprietary models.
Despite financial and resource challenges, DeepSeek remains committed to AGI research, with a long-term strategy centered on mathematical reasoning, multimodality, and language understanding.
Liang believes hardcore innovation will only increase in the future. It’s not widely understood now because society as a whole needs to learn from reality. “When this society starts celebrating the success of deep-tech innovators, collective perceptions will change. We just need more real-world examples and time to allow that process to unfold,” Liang said in an interview in July 2024.
Interestingly, when a reporter asked that many other AI startups insist on balancing both model development and applications, since technical leads aren’t permanent; why is DeepSeek confident in focusing solely on research?
Liang Wenfeng said, “All strategies are products of the past generation and may not hold true in the future. Discussing AI’s future profitability using the commercial logic of the internet era is like comparing Tencent’s early days to General Electric or Coca-Cola—it’s essentially carving a boat to mark a sword’s position, an outdated approach.”
The people they hire don’t necessarily come from computer science departments either. Besides STEM talent, DeepSeek has also recruited liberal arts professionals, called “Data Numero Uno”, to provide historical, cultural, scientific, and other relevant sources of knowledge to assist technicians in expanding the capabilities of AGI models with high-quality textual data.
Since its inception, DeepSeek has maintained an organizational culture that is “rank-less and extremely flat”. Members of DeepSeek are divided into different research groups according to specific goals. Instead of a hierarchical relationship, there is a “natural division of labor,” with each member being responsible for the part of the project that he or she is best at and then discussing the difficulties together.
According to Liang, one of the results of this natural division of labor is the birth of MLA (Multiple Latent Attention), which is a key framework that greatly reduces the cost of model training. “MLA was initially a personal interest of a young researcher, but when we realized that it had potential, we mobilized our resources to develop it, and the result was a miraculous achievement,” said Liang.
Principal Researcher – Talent and Recruitment
Liang’s idealism or curiosity alone cannot make it a success; his recruitment standards and management methods are the key, said Feng Xiqian, a Hong Kong commentator. “Liang’s hiring principle is based on ability, not experience, and core positions are filled by fresh graduates and young people who have graduated for one or two years. As for measuring the ability of newcomers, apart from institutional background (mainly Tsinghua and Peking University students), he also looks at competition results and does not use anything below the gold medal – he only recruits 1% of the top geniuses to do what 99% of Chinese companies can’t do.”
Luo Fuli, Principal Researcher of Deepseek, is one of the 139 employees that have demonstrated their exceptional talent at a very young age. A Beijing citizen, she has not [turned] 30 yet but has already published 41 papers since 2018, and was a recipient of China’s National Scholarship in 2016.
Luo got her bachelor’s degree in computer science from Beijing Normal University and a Master of Science degree in Computational Linguistics from Peking University. She got her first job right after graduating from Peking University at Alibaba DAMO Academy for Discovery, Adventure, Momentum and Outlook, where she did pre-training work of open-source language models such as AliceMind and multi-modal model VECO. She joined High-Flyer in 2022 to do deep-learning research on strategy model and algorithm building and later joined Deepseek to develop MoE LLM V2.
She is said to have accepted the CNY 10 million package offered by Xiaomi’s founder Lei Jun just days before Deepseek-V3 was launched.
How Deepseek was able to achieve low cost by getting similar performance
About getting on par performance
I tend to hold a critical thinking position on this. Deepseek was established in July 2023, and OpenAI was founded in 2015. While OpenAI and many western AI companies had to build their generative AI from [the] ground up, Deepseek as a late comer was able to avoid many pitfalls experienced by those predecessors and build on the foundations of open-source contributors.
Without a doubt, Deepseek must have built on the foundation of some open-source database or pre-trained models. This is because inferencing has to rely on pre-trained knowledge. As a company which was established only 1.5 years ago, Deepseek was unlikely to build everything from scratch by itself.
Ethan Tu, founder of Taiwan AI Labs, pointed out that open-source models have results that benefit from the results of many open sources, including datasets, algorithms, platforms. And the U.S. is still a major contributor in open source.
“The release of Deepseek on Jan 27 only tells us that the hegemony of AI is not only in the mastery of computing power, but also in the basic skills of investing in software and applications,” wrote Tu. “The technology part (of Deepseek) is worth learning and admiring, but let’s just consider the ‘China surpassing the U.S. or whatever’ rhetorics as marketing language. The marketing is so successful that related stocks plunged today.”
About low cost
1. More efficient in the way it operates
Of course, necessity is the mother of innovation. Not having access to advanced GPUs also serve as a driver for Deepseek and other Chinese AI companies to innovate on more efficient use of computing power.
Due to the US export control, Deepseek has to come up with a more effective way to train the model. So they combined a series of engineering techniques to improve the model architecture, and finally succeeded in breaking through the technological bottleneck under the export ban. Using fewer computing resources to perform complex logical reasoning tasks not only saves costs but also eliminates the need to use the most advanced chips.
DeepSeek V3 introduces Multi-Token Prediction (MTP), enabling the model to predict multiple tokens at once with an 85-90% acceptance rate, boosting processing speed by 1.8x. It also uses a Mixture-of-Experts (MoE) architecture with 671 billion total parameters, but only 37 billion are activated per token, optimizing efficiency while leveraging the power of a massive model.
However, it is noteworthy that Deepseek is an inferencing LLM, not an instructive LLM. Crane Bamboo (鶴竹子), a Wechat blogger, gave a vivid example. ChatGPT needs detailed instructions from a user to accomplish a task. For example, you want it to analyze the energy industry.
More often than not, ChatGPT or any other instruction-based generative AI models would spill out very stiff and superficial information that people will easily recognize it was written by AI.
Crane Bamboo said the right way to use Deepseek is clearly specify how you want the information to be used. For example:
What Deepseek needs is a scenario and specific requests. Therefore, Deepseek is able to come up with more precise answers catered to the needs of the prompter and cut down on the time for scrambling for a vast scope of knowledge. Therefore, having a more targeted scenario and purpose for the information would significantly decrease the computing power required for each task.
When the users are not satisfied with the answers given by Deepseek, which are too abstract, they can prompt “說人話 (speak to me like a person)”. They would immediately rephrase and make the content more easy for people to understand.
From the examples above it is also fair to say that if users have specific scenarios and purposes in mind right at the onset of prompting, that will also boost the speed of generating the content. If it takes less time to process, it would consume less energy, and thus bring down the costs.
Meanwhile, since it is an inference-based system, it is likely to depend on neural networks, which consumes less energy than merely depend on GPUs and CPUs.
Taiwan-based AI startup Kneron [claims] to offer much cheaper edge AI inferencing solutions based on neural networks. Their applications are focused on smart mobility, smart security, and smart building.
Ethan Tu, a Taiwanese AI expert commented on Deepseek’s performance on using less chips that computing power is important but so is data and algorithms. Hardware is at the front and software is at the back. MoE is not a new idea, it is a trend, and small models will be the future.
2. Energy costs
Some argue that China burns cheaper coals, so the energy consumed by Deepseek is cheaper. Well, not quite. The increased use of renewable energy and the innovations in energy efficiency are key.
According to China’s Energy Transition Whitepaper released by China’s State Council in August 2024, as of the end of 2023, the installed scale of wind power and photovoltaic power generation had increased 10 times compared with a decade ago, with installed clean energy power generation accounting for 58.2% of the total, and new clean energy power generation accounting for more than half of the incremental electricity consumption of the whole society.
The proportion of clean energy consumption in total energy consumption increased from 15.5% to 26.4%, and the proportion of coal consumption decreased by 12.1 percentage points.
Over the past decade, China has eliminated more than 100 million kilowatts of outdated coal power production capacity, and reduced pollutant emissions from the power industry by more than 90%. The rate of electrification of end-use energy in society as a whole has reached 28%. Compared with 2012, energy consumption per unit of GDP has dropped by more than 26%.
“Green energy technology has realized new breakthroughs. It has built a wind power and photovoltaic industry chain R&D, design and manufacturing system, fully mastered large-scale third-generation pressurized water reactor and high-temperature gas-cooled reactor fourth-generation nuclear power technology and is a global leader in hydropower industry chain system.”
3. Government subsidies
Besides the subsidy provided by the central government, the local municipal and provincial governments also have incentives to support AI companies in China. Those incentives include tax breaks, investments, cheap rents for offices located in AI clusters operated by the local governments and talent training programs.
The average salary of AI-related talent freshly out of schools or graduate schools are around CNY15k-25k, which is already considered very well paid in China.
Does Deepseek’s success mean we don’t need that many Nvidia’s GPUs?
Will Nvidia be affected in the short term by the drastic reduction in the cost of AI training? “I don’t think so, because when AI can be so popularized and generalized at a low cost, it will only increase the world’s demand for it,” wrote Sega Cheng, CEO and co-founder of iKala, a Taiwanese AI company.
Even if the demand for Nvidia’s GPUs decline, Nvidia accounts for less than 15% of TSMC’s revenue and less than 10% of global semiconductor revenue. “As far as Nvidia’s major customers such as Open AI, Microsoft, Amazon, Google, Meta are concerned, it is unlikely that the GB200/300/Rubin orders that were previously placed will be drastically reduced in the short term, and it will take time to change the training methodology, so it is very likely that the order adjustments will happen in 2026 and beyond,” opined Andrew Lu, a retired investment bank semiconductor analyst based in Taiwan.
The demands for GPUs as a whole may not decrease, but certainly there will be competition among GPU users for the most energy efficient solutions. We will continue to see cloud service providers and generative AI service providers develop their Application Specific ICs (ASICs) to work with their software and algorithms to optimize the performance.
Those chips will continue to be produced by foundries that are most trusted by the customers.
Many research institutions including Gartner and IDC predict that the global demand for semiconductors will grow by 14%-over 15% in 2025, thanks to the robust growth in AI and high-performance computing (HPC). However, TSMC’s chairman and CEO C.C. Wei said recently that TSMC is confident to see CAGR close to 20% in the next five years. The foundry giant is also increasing capital spending again in 2025.
Since TSMC manufactures some 90% of the chips manufactured by 7nm and more advanced processes, which are the chips needed for HPC and AI computing, hence TSMC is likely to continue enjoying higher-than-average growth in the coming years.
The implication of US export control on Nvidia and TSMC in the short run is still likely to influence the location distribution of AI chips made by the two companies. The more important question is, if the trend is moving towards a more software-defined AI computing future, how would it affect the demand for high-bandwidth memory (HBM) and heat dissipation solutions for AI servers?
Why is the Deepseek phenomenon significant: US-China AI War changing directions?
Export control slowed down the catch-up speed of China’s AI technology development, but the win of Deepseek this round is unlikely to make the US give up the policy. The long game for AI supremacy competition is becoming more complex.
Chris Miller, author of Chip War, revealed at the CommonWealth Economic Forum in early January 2025 how AI is transforming the US-China Chip War into a broader “Cloud War.” He was right seeing scaling laws falter and efficiency overtakes raw scale. Whether through breakthroughs in inference compute, efficient algorithms, or geopolitical maneuvering, the Chip War is evolving into a broader contest for technological and economic supremacy in the age of AI, said Miller, who also believes tech decoupling is already in place.
Feng thinks Deepseek completely challenged the conventional thinking in Silicon Valley. “From an objective point of view, it is ironic that the U.S. ban has triggered the small universe of these Chinese geniuses, forcing them to innovate with no other choice.”
The success of Deepseek may attract investment capital and talent away from other Chinese AI startups. Meanwhile, who knows if the Deepseek engineers won’t be lured away by other companies? Inner competition among Chinese AI firms have been fierce, and people have no loyalty for employers.
Ethan Tu already noticed that Deepseek is already censoring prompts to make sure their answers are “politically correct”.
“Performance tests for generative AI platforms are like the entrance exams, I am more concerned about the applications and how they are to make a difference in the society and the wellbeing of humanity as a whole,” wrote Tu, who is an AI expert who has been an advocate for the value of democracy.
The rise of DeepSeek AI marks a pivotal moment in the global AI race, proving that innovation can thrive under constraints. While U.S. export controls aimed to slow China’s progress, they may have inadvertently fueled a wave of ingenuity, forcing Chinese engineers to think differently and push efficiency over sheer scale. Yet, this breakthrough is unlikely to prompt Washington to reconsider its policies. If anything, it reinforces the view that the AI rivalry is evolving into a broader “Cloud War,” where technological and economic supremacy will be defined not just by hardware, but by intelligence, adaptability, and strategic maneuvering.
However, the road ahead remains uncertain. DeepSeek’s success could spark a surge of investment in China’s AI ecosystem, but internal competition, talent poaching, and the ever-present challenge of censorship cast shadows over its future. As Ethan Tu warns, true AI impact lies beyond mere performance tests—it’s about how these technologies shape society. For now, as the famous Chinese saying goes, “Let the bullets fly a little while longer.” The AI race is far from over, and the next chapter is yet to be written.
To read this Deep Dive as it was published on the AI Supremacy website, click here.