- 🆕 Fast-inference variant of GLM-4.7, tuned for speed.
- 🧠 30B-A3B Mixture-of-Experts with roughly 3B active parameters.
- 📏 128K-token context window for long inputs.
- 🔧 Reasoning, function-calling, and web search supported.
- ⚡ Optimized for quick responses on latency-sensitive workloads.
- 🔒 Open-weight under the MIT license, fp8 quantized.
- 🏢 Built by Z.ai (formerly Zhipu AI), released January 2026.
Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of large language…
Explore 10 more models by Z.ai →GLM 4.7 Flash is the speed-optimized member of Z.ai's GLM-4.7 generation, positioned as a lightweight companion to the full [[sibling:zai-org-glm-4.7|GLM 4.7]]. It is a 30B-A3B Mixture-of-Experts model, activating roughly 3B parameters per token. The model ships open-weight under the MIT license with a 128K-token context window and fp8 quantization.
Compared with its same-family parent, Flash is tuned for faster, cheaper inference while inheriting the 4.7 generation's coding, tool-calling, and multi-step reasoning behaviors. Z.ai exposes reasoning, function-calling, and web-search capabilities, making it suitable for agentic and tool-augmented workflows.
Because it is tuned for throughput, Flash is best suited to high-volume, latency-sensitive, or background tasks, whereas the full GLM 4.7 remains the heavier option for the most demanding prompts.
Within the broader family, GLM 4.7 Flash follows earlier releases such as [[sibling:zai-org-glm-4.6|GLM 4.6]] and sits alongside the larger [[sibling:zai-org-glm-5|GLM 5]] line. It is distributed via Hugging Face, where the open weights and model card are published for self-hosting and deployment.
This About section is AI-generated from public sources via VeniceStats + Venice inference, with no human editing. It may contain inaccuracies.
| Seller | Reputation↓ | Input $/M | Cached $/M | Output $/M | Categories | API |
|---|---|---|---|---|---|---|
| Venice.ai Proxy 0x1f22…18c9 | 88 | $0.0625 | $0.0625 | $0.25 | chat,reasoning,web-search | openai-chat-completions |
| Open Forge 0x1d90…b0aa | 87 | $0.06 | $0.01 | $0.40 | chat,fast | openai-chat-completions |
| surplusintelligence.ai 0x0e49…8927 | 79 | $0.125 | $0.125 | $0.50 | anon,chat,reasoning,web-search,cheap,research,translate | openai-chat-completions |
| Open Bird 0xc0f1…8183 | 57 | $0.03 | $0.03 | $0.20 | chat | openai-chat-completions |
| Fire Ant 🔥🐜 0xbe05…bc5d | 45 | $0.025 | $0.025 | $0.1011 | anon,chat,cheap,code,coding,e2ee,fast,json,open-source,privacy,reasoning,research,tee,tools,translate,web-search | — |
| ▲ Apex Ant 0x73b4…e736 | 40 | $0.0012 | $0.0002 | $0.0049 | chat,fast,open-source,privacy | openai-chat-completions |
| Leftermute 0x388b…5389 | 26 | $0.0118 | $0.0118 | $0.05 | chat,coding,json,tools | openai-chat-completions |
"Best price" and the seller table are live AntSeed catalog data (advertised $/1M tokens, not settled amounts). Reputation = on-chain trust (0-100). Model knowledge (TLDR, provider, About) via the VeniceStats enrichment layer. Advertised catalog, not the model used in any specific purchase.