- 🧠 Reasoning-optimized variant of Arcee AI's Trinity-Large 398B sparse MoE family.
- ⚡ Roughly 13B active parameters per token for efficient inference.
- 📏 256K context window for long, multi-step agentic chains.
- 💬 Emits extended chain-of-thought inside reasoning-trace blocks.
- 🔧 Tool calling and agentic RL post-training for long-horizon tasks.
- 🌐 Multilingual training across 14 non-English languages.
- 🔒 Released under Apache 2.0 in 2026.
- 🏢 Built on Trinity-Large-Base; trained with Muon optimizer and SMEBU.
Arcee AI is an artificial intelligence company focused on developing advanced language models. The organization has built a reputation in the open-source AI community for its work on model optimization and specialized text generation architectures.
Trinity Large Thinking is the reasoning-oriented member of Arcee AI's Trinity-Large series, a sparse Mixture-of-Experts model with roughly 398–400B total parameters and about 13B activated per token. It shares the same MoE architecture as the chat-focused Trinity-Large-Preview but is post-trained for extended chain-of-thought reasoning and agentic reinforcement learning, making it suited to long-horizon agents, multi-turn tool calling, and audit-friendly stepwise output.
The chief distinction from its same-family predecessors is reasoning behavior. Where Trinity-Large-Preview is lightly post-trained and chat-ready without trace output, Thinking emits intermediate reasoning inside dedicated reasoning-trace blocks before its final answer, and it is built on the Trinity-Large-Base foundation rather than being a fresh pretraining run.
Architecturally, the Trinity-Large family uses 256 experts with 4 active per token, interleaved local and global attention, gated attention, and sigmoid routing, according to Arcee's technical report. Training used the Muon optimizer plus a load-balancing technique called Soft-clamped Momentum Expert Bias Updates (SMEBU) across a 17-trillion-token pretraining recipe, completing with zero loss spikes.
The model supports tool calling, multilingual input, and a large context window for sustained agentic workflows. It is distributed under Apache 2.0, with FP8 weights and quantized GGUF builds available for self-hosting.
This About section is AI-generated from public sources via VeniceStats + Venice inference, with no human editing. It may contain inaccuracies.
| Seller | Reputation↓ | Input $/M | Cached $/M | Output $/M | Categories | API |
|---|---|---|---|---|---|---|
| Venice.ai Proxy 0x1f22…18c9 | 88 | $0.1563 | $0.0375 | $0.5625 | chat,reasoning,coding,web-search | openai-chat-completions |
| surplusintelligence.ai 0x0e49…8927 | 79 | $0.3125 | $0.075 | $1.125 | anon,chat,code,coding,reasoning,web-search,research | openai-chat-completions |
| Fire Ant 🔥🐜 0xbe05…bc5d | 45 | $0.0625 | $0.0275 | $0.225 | anon,chat,code,coding,json,reasoning,research,tools,web-search | — |
| ▲ Apex Ant 0x73b4…e736 | 40 | $0.0027 | $0.0005 | $0.009 | chat,reasoning,research | openai-chat-completions |
| Leftermute 0x388b…5389 | 26 | $0.0631 | $0.0631 | $0.2273 | chat,coding,json,reasoning,tools | openai-chat-completions |
| Meridian AI 0x8c8c…06f5 | 2 | $0.0675 | $0.0675 | $0.243 | chat,reasoning | openai-chat-completions |
"Best price" and the seller table are live AntSeed catalog data (advertised $/1M tokens, not settled amounts). Reputation = on-chain trust (0-100). Model knowledge (TLDR, provider, About) via the VeniceStats enrichment layer. Advertised catalog, not the model used in any specific purchase.