Agreed. I am not longer paying token fees as I am running QWEN 3.6 27B MTP on my 4090 GPU and it is as good and as fast as the frontier models for agentic coding.
I am using llamma.cpp with QWEN 3.6 27B MTP, with a 64k context window on a 4090 that OpenCode talks to and then it in term talks to the Unity Game engine via MCP. Getting 80/112 tokens/second work 90 average which is shocking to me as it really does feel as fast as cloud AI (well faster for me as I am in Vietnam and round trips to US data centers really adds up in a session). The only really issue is you pretty much have to one shot prompts as follow up prompts will easily go over the context window size. If I cannot one shot prompts them use cloud AI both that is very rare for my use case. Maybe 1 in 50 or so and only when the tasks touches a lot of large scripts and scenes.
Same. I’m running Qwen3.6-35B-A3B-FP8 (Qwen3.6-35B-A3B-UD-IQ4_XS.gguf) via the turboquant fork of llama.cpp with a few tweaked memory settings, and I get like 40 tokens / second – nothing that required special insight on my part just following the instructions I saw on a youtube video I found via !LocalLLaMA@sh.itjust.works and asking claude to help me through the installation.
AI has no economic moat. There’s nothing stopping anyone from running LLMs locally.
I just updated my setup from LMStudio to llama.cpp with the new QWEN 3.6 27B MTP model and I am getting 80-112 tokens/second, 90 average which is just shocking to me. I am on a 4090 with a context
Window of 64k. It hardly use cloud AI anymore as I rarely need more than 64k if I ensure my first prompt is written like a design document. Multiple prompts are not great so I often just figure out where my initial prompt went wrong, adjust and try again in a fresh session. Way faster this way too. It has really worked out well for me as I am getting just as much done locally for free as I was with hundreds of dollar a month on cloud AI. I am still shocked and grateful it flowed this way.
Agreed. I am not longer paying token fees as I am running QWEN 3.6 27B MTP on my 4090 GPU and it is as good and as fast as the frontier models for agentic coding.
What’s the rest of your stack look like?
I am using llamma.cpp with QWEN 3.6 27B MTP, with a 64k context window on a 4090 that OpenCode talks to and then it in term talks to the Unity Game engine via MCP. Getting 80/112 tokens/second work 90 average which is shocking to me as it really does feel as fast as cloud AI (well faster for me as I am in Vietnam and round trips to US data centers really adds up in a session). The only really issue is you pretty much have to one shot prompts as follow up prompts will easily go over the context window size. If I cannot one shot prompts them use cloud AI both that is very rare for my use case. Maybe 1 in 50 or so and only when the tasks touches a lot of large scripts and scenes.
Same. I’m running Qwen3.6-35B-A3B-FP8 (Qwen3.6-35B-A3B-UD-IQ4_XS.gguf) via the turboquant fork of llama.cpp with a few tweaked memory settings, and I get like 40 tokens / second – nothing that required special insight on my part just following the instructions I saw on a youtube video I found via !LocalLLaMA@sh.itjust.works and asking claude to help me through the installation.
AI has no economic moat. There’s nothing stopping anyone from running LLMs locally.
I just updated my setup from LMStudio to llama.cpp with the new QWEN 3.6 27B MTP model and I am getting 80-112 tokens/second, 90 average which is just shocking to me. I am on a 4090 with a context Window of 64k. It hardly use cloud AI anymore as I rarely need more than 64k if I ensure my first prompt is written like a design document. Multiple prompts are not great so I often just figure out where my initial prompt went wrong, adjust and try again in a fresh session. Way faster this way too. It has really worked out well for me as I am getting just as much done locally for free as I was with hundreds of dollar a month on cloud AI. I am still shocked and grateful it flowed this way.
What do you run it on?
https://www.amazon.com/dp/B0BV8H8HVD with linux mint installed.