DeepSeek Debuts Sparse Attention for Faster, More Efficient AI

September 29 • 5:25 pm

Tags:

No tags

After being a critical player in the deep reasoning movement among AI companies, DeepSeek has introduced an experimental technique called DeepSeek Sparse Attention. The new mechanism is designed to explore and validate optimizations for training and inference efficiency when responding to long queries, DeepSeek said on Sept 29.

What is a sparse attention mechanism?

In generative AI, a sparse attention mechanism is a method of preventing the neural architecture from connecting every token to every other token. Instead, each token attends to a smaller subset of tokens. Using a sparse attention mechanism reduces the computation and memory needed to produce a response. The difference is most visible in queries containing thousands or hundreds of thousands of tokens.

DeepSeek-V3.2-Exp with sparse attention has weights and code available on HuggingFace for local use. It is also available on the web, the DeepSeek app, and through the API.

The release is “an intermediate step toward our next-generation architecture,” DeepSeek wrote on the model card on Sept. 29.

Precision formats: FP8 today, BF16 in progress

DeepSeek has suggested its newest models support FP8 or Floating Point 8 architecture, Bloomberg reported on Monday. FP8 is commonly used in AI training to improve efficiency through faster computation and less memory consumption. In addition, DeepSeek is working on supporting BF16 or Brain Floating Point 16, a format that supports increased calculation speed.

Pricing and developer access

The company cut DeepSeek API prices by 50% or more on Monday. Doing so could give prospective developers an easier on-ramp to the new models, which DeepSeek may be betting will bring on more long-term customers.

Huawei supports DeepSeek, providing a Chinese Nvidia alternative

Huawei says its Ascend chips will support inference of the new model, Bloomberg said. Its Ascend chips will be able to run the new model. Meanwhile, Huawei is ramping up production in China, presenting itself as an alternative to the Nvidia’s AI chips produced in the US.

Last week, Huawei detailed a three-year plan to use its new “SuperPoD” systems to link thousands of Ascend processors together.

Huawei competes with Nvidia, but US-China export rules create an uncertain regulatory environment for advanced AI chips.

DeepSeek’s latest AI model won’t hit the market on schedule, with supply issues tied to Nvidia chips slowing deployment. See how this delay could reshape competition in the global AI race.

The post DeepSeek Debuts Sparse Attention for Faster, More Efficient AI appeared first on eWEEK.

DeepSeek Debuts Sparse Attention for Faster, More Efficient AI

What is a sparse attention mechanism?

Precision formats: FP8 today, BF16 in progress

Pricing and developer access

Huawei supports DeepSeek, providing a Chinese Nvidia alternative

No Responses

Leave a Reply Cancel reply