- 🧠 Unified reasoning and non-reasoning model, trained from scratch by NVIDIA.
- 🔧 Hybrid Mamba-2 plus Transformer backbone with sparse Mixture-of-Experts layers.
- 📏 3.2B active parameters, 31.6B total; supports up to 1M-token context.
- ⚡ Optimized for high-throughput inference on a single H200 GPU.
- 🆕 Better accuracy than prior Nemotron 2 Nano while activating fewer parameters.
- 🎯 Built for agentic AI, chatbots, RAG, and coding workflows.
- 🌐 Trained on English plus 19 languages and 43 programming languages.
- 🔒 Fully open weights, datasets, and training recipes released.
Nvidia Corporation is an American technology company founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, headquartered in Santa Clara, California. Long recognized as the dominant force in graphics processing units, Nvidia has expanded into a central pillar of…
Explore 2 more models by Nvidia →NVIDIA Nemotron 3 Nano 30B-A3B is a compact Mixture-of-Experts language model in the Nemotron 3 family, trained from scratch by NVIDIA and designed as a unified system for both reasoning and non-reasoning tasks. It first generates a reasoning trace and then concludes with a final response, targeting developers building AI agents, chatbots, and retrieval-augmented systems. Architecturally, it pairs a hybrid Mamba-2 and Transformer backbone with sparse MoE feed-forward layers, activating just 3.2B of its 31.6B total parameters per forward pass.
Against its same-family predecessor, NVIDIA states Nemotron 3 Nano achieves better accuracy than the previous-generation Nemotron 2 Nano while activating less than half the parameters per forward pass. NVIDIA positions the model for high inference throughput on a single H200 GPU at an 8K-input/16K-output setting.
The model supports context up to 1M tokens, though deployment defaults and VRAM constraints often run it at 256k; this catalog entry exposes a 128k window with FP8 quantization. Training data covers webpages, dialogue, and articles in English, 19 additional languages, and 43 programming languages.
Within Venice's NVIDIA lineup, Nemotron 3 Nano sits alongside the later [[sibling:nvidia-nemotron-cascade-2-30b-a3b|Nemotron Cascade 2 30B A3B]] text model, the [[sibling:text-embedding-nemotron-embed-vl-1b-v2|Nemotron Embed VL 1B v2]] embedding model, and the [[sibling:nvidia/parakeet-tdt-0.6b-v3|Parakeet ASR]] speech model. NVIDIA released the weights, training recipe, and redistributable data openly.
This About section is AI-generated from public sources via VeniceStats + Venice inference, with no human editing. It may contain inaccuracies.
| Seller | Reputation↓ | Input $/M | Cached $/M | Output $/M | Categories | API |
|---|---|---|---|---|---|---|
| Venice.ai Proxy 0x1f22…18c9 | 88 | $0.0375 | $0.0375 | $0.15 | chat,web-search | openai-chat-completions |
| surplusintelligence.ai 0x0e49…8927 | 79 | $0.075 | $0.075 | $0.30 | anon,chat,web-search | openai-chat-completions |
| Open Bird 0xc0f1…8183 | 57 | $0.025 | $0.025 | $0.10 | chat,open-source | openai-chat-completions |
| Fire Ant 🔥🐜 0xbe05…bc5d | 45 | $0.015 | $0.015 | $0.09 | anon,chat,coding,json,open-source,web-search | — |
| ▲ Apex Ant 0x73b4…e736 | 40 | $0.00 | $0.00 | $0.00 | chat,fast,cheap,open-source,free | openai-chat-completions |
| Leftermute 0x388b…5389 | 26 | $0.0069 | $0.0069 | $0.0273 | chat,coding,json | openai-chat-completions |
| Meridian AI 0x8c8c…06f5 | 2 | $0.0073 | $0.0073 | $0.0292 | chat | openai-chat-completions |
"Best price" and the seller table are live AntSeed catalog data (advertised $/1M tokens, not settled amounts). Reputation = on-chain trust (0-100). Model knowledge (TLDR, provider, About) via the VeniceStats enrichment layer. Advertised catalog, not the model used in any specific purchase.