Sixty-six percent of enterprises worldwide said they would be investing in genAI over the next 18 months, according to IDC research. Among organizations indicating they will increase IT spending for genAI in 2024, infrastructure will account for 46% of the total spend.
The problem: a key piece of hardware needed to build out that AI infrastructure is in short supply. While GPUs are in high demand to run the most massive large language models (LLMs) behind genAI, the market still needs high-performance memory chips for AI apps. The market is tight for both — for now.
GPUs used for training and inference tasks on LLMs can consume vast amounts of processor cycles and be costly to use. Smaller, more industry- or business-focused models can often provide better results tailored to business needs, and they can use common x86 processors with NPUs.
“While much of the focus is on the use of high-performance GPUs for new AI workloads, the major hyperscalers (AWS, Google, Meta and Microsoft) are all investing in developing their own chips optimized for AI,” Priestley said.
While chip development is expensive, using custom-designed chips can improve operational efficiencies, reduce the costs of delivering AI-based services to users, and lower costs for users to access new AI-based applications, according to Priestley.
“As the market shifts from development to deployment we expect to see this trend continue,” Priestley said.