- - 🧠 Mixture-of-experts: 235B total parameters, 22B active per token.
- - ⚡ Non-thinking variant: direct answers, no reasoning traces.
- - 📏 Model card cites 256K native context, extendable toward 1M tokens.
- - 🔧 Strong tool-calling and agentic use via Qwen-Agent and MCP.
- - 🌐 Multilingual coverage across many languages and dialects.
- - 🔒 Apache 2.0 license, offered here in FP8 quantization.
- - 🏢 Built by Alibaba's Qwen team.
- - 🎯 Aimed at long documents, technical work, high-precision tasks.
Alibaba Group is a Chinese multinational technology company founded in 1999 and headquartered in Hangzhou, Zhejiang. Originally built around e-commerce and cloud computing, Alibaba has become one of the most prolific contributors to open-weight AI research, developing the Qwen…
Explore 24 more models by Alibaba Group →Qwen 3 235B A22B Instruct 2507 is a flagship Mixture-of-Experts model from Alibaba's Qwen team, released in 2025. It holds 235 billion total parameters but activates roughly 22 billion per forward pass, balancing capacity with inference cost. This is the instruction-tuned "non-thinking" line, meaning it returns direct responses without producing intermediate reasoning blocks, making outputs faster and more format-consistent than reasoning-chain variants. It carries an Apache 2.0 license and is offered here in FP8, which reduces memory footprint versus full precision.
The 2507 update is positioned as the refreshed version of the original Qwen3-235B-A22B non-thinking mode. Per Qwen's model card, it brings improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, multilingual understanding, and tool usage over that predecessor. The card also describes enhanced 256K long-context understanding, with configurations enabling ultra-long inputs toward one million tokens.
Its closest sibling is [[sibling:qwen3-235b-a22b-thinking-2507|Qwen 3 235B A22B Thinking 2507]], which shares the same architecture but generates explicit reasoning chains for complex problems, trading latency and token use for deeper deliberation. For vision and multimodal work, the family extends to [[sibling:qwen3-vl-235b-a22b|Qwen3 VL 235B]].
In practice, this Instruct variant suits high-throughput, latency-sensitive workloads — chatbots, API integrations, document analysis, and code generation — where consistent formatting matters more than visible step-by-step reasoning. Deployment is substantial, typically requiring multi-GPU tensor parallelism.
This About section is AI-generated from public sources via VeniceStats + Venice inference, with no human editing. It may contain inaccuracies.
| Seller | Reputation↓ | Input $/M | Cached $/M | Output $/M | Categories | API |
|---|---|---|---|---|---|---|
| Venice.ai Proxy 0x1f22…18c9 | 88 | $0.075 | $0.075 | $0.375 | chat,web-search | openai-chat-completions |
| Open Forge 0x1d90…b0aa | 87 | $0.00 | $0.00 | $0.00 | chat,fast,free | openai-chat-completions |
| surplusintelligence.ai 0x0e49…8927 | 79 | $0.15 | $0.15 | $0.75 | anon,chat,web-search,cheap,translate,code,coding,tasks,agents,fast,research | openai-chat-completions |
| Fire Ant 🔥🐜 0xbe05…bc5d | 45 | $0.065 | $0.065 | $0.365 | agents,anon,chat,cheap,code,coding,fast,long-context,open-source,privacy,research,tasks,translate,web-search | — |
| ▲ Apex Ant 0x73b4…e736 | 40 | $0.0032 | $0.0006 | $0.0045 | chat,open-source,long-context,tasks,coding,privacy,translate | openai-chat-completions |
| D5V1N2 0xd5e7…7be0 | 35 | $0.052 | $0.008 | $0.108 | chat,reasoning,research,venice,anon,qwen,qwen3,qwen-235b,qwen3-235b,instruct,cheap | openai-completions,openai-chat-completions |
| ➤Bullet Ant 🐜 0xe924…8936 | 16 | $0.20 | $0.20 | $0.60 | chat,coding,reasoning,anon | openai-chat-completions |
"Best price" and the seller table are live AntSeed catalog data (advertised $/1M tokens, not settled amounts). Reputation = on-chain trust (0-100). Model knowledge (TLDR, provider, About) via the VeniceStats enrichment layer. Advertised catalog, not the model used in any specific purchase.