Z.ai·text

GLM 4.7 Flash

ReasoningWeb searchFunction calling

Quick reference

GLM 4.7 Flash — TLDR

🆕 Fast-inference variant of GLM-4.7, tuned for speed.
🧠 30B-A3B Mixture-of-Experts with roughly 3B active parameters.
📏 128K-token context window for long inputs.
🔧 Reasoning, function-calling, and web search supported.
⚡ Optimized for quick responses on latency-sensitive workloads.
🔒 Open-weight under the MIT license, fp8 quantized.
🏢 Built by Z.ai (formerly Zhipu AI), released January 2026.

💰 Best price on AntSeed

$0.0012 / $0.0049−99%

per 1M · cheapest in / out

📏 Context

128K tokens

🐜 Sellers

advertising on AntSeed

Provider

Z.ai

Z.ai, formally Knowledge Atlas Technology Joint Stock Co., Ltd., is a Chinese technology company specializing in artificial intelligence. Previously known internationally as Zhipu AI, the company rebranded to Z.ai in 2025. Its core focus is the GLM family of large language…

Site ↗X ↗Wikipedia ↗

Explore 10 more models by Z.ai →

About this model

GLM 4.7 Flash is the speed-optimized member of Z.ai's GLM-4.7 generation, positioned as a lightweight companion to the full [[sibling:zai-org-glm-4.7|GLM 4.7]]. It is a 30B-A3B Mixture-of-Experts model, activating roughly 3B parameters per token. The model ships open-weight under the MIT license with a 128K-token context window and fp8 quantization.

Compared with its same-family parent, Flash is tuned for faster, cheaper inference while inheriting the 4.7 generation's coding, tool-calling, and multi-step reasoning behaviors. Z.ai exposes reasoning, function-calling, and web-search capabilities, making it suitable for agentic and tool-augmented workflows.

Because it is tuned for throughput, Flash is best suited to high-volume, latency-sensitive, or background tasks, whereas the full GLM 4.7 remains the heavier option for the most demanding prompts.

Within the broader family, GLM 4.7 Flash follows earlier releases such as [[sibling:zai-org-glm-4.6|GLM 4.6]] and sits alongside the larger [[sibling:zai-org-glm-5|GLM 5]] line. It is distributed via Hugging Face, where the open weights and model card are published for self-hosting and deployment.

View source on GitHub ↗View model card on HuggingFace ↗

Sources

zai-org/GLM-4.7-Flash · Hugging Face· huggingface.co

This About section is AI-generated from public sources via VeniceStats + Venice inference, with no human editing. It may contain inaccuracies.

Sellers serving GLM 4.7 Flash (7)

Seller	Reputation↓	Input $/M	Cached $/M	Output $/M	Categories	API
Venice.ai Proxy 0x1f22…18c9	88	$0.0625	$0.0625	$0.25	chat,reasoning,web-search	openai-chat-completions
Open Forge 0x1d90…b0aa	87	$0.06	$0.01	$0.40	chat,fast	openai-chat-completions
surplusintelligence.ai 0x0e49…8927	79	$0.125	$0.125	$0.50	anon,chat,reasoning,web-search,cheap,research,translate	openai-chat-completions
Open Bird 0xc0f1…8183	57	$0.03	$0.03	$0.20	chat	openai-chat-completions
Fire Ant 🔥🐜 0xbe05…bc5d	45	$0.025	$0.025	$0.1011	anon,chat,cheap,code,coding,e2ee,fast,json,open-source,privacy,reasoning,research,tee,tools,translate,web-search	—
▲ Apex Ant 0x73b4…e736	40	$0.0012	$0.0002	$0.0049	chat,fast,open-source,privacy	openai-chat-completions
Leftermute 0x388b…5389	26	$0.0118	$0.0118	$0.05	chat,coding,json,tools	openai-chat-completions

"Best price" and the seller table are live AntSeed catalog data (advertised $/1M tokens, not settled amounts). Reputation = on-chain trust (0-100). Model knowledge (TLDR, provider, About) via the VeniceStats enrichment layer. Advertised catalog, not the model used in any specific purchase.