Photo of DeepakNess DeepakNess

Cheaper coding models are excellent with good prompts

Dan posted about cancelling Cursor after 30 minutes of using Composer 2.5, and while the wording was too harsh, I don't think the frustration itself is completely wrong.

I posted about this on X as well, but I think the main thing people miss is that cheaper or smaller models need to be prompted differently.

You can't prompt DeepSeek, Kimi, GLM, Composer, Qwen, or other similar models the same way you prompt Claude Code or Codex.

With Claude Code and Codex, especially with the latest models, you can throw a broad task at them and expect the model to explore the repository, understand patterns, implement the feature, run tests, and come back with something mostly correct. It will still make mistakes, obviously, but it can handle much more ambiguity.

But these cheaper models are not great at that kind of one-shot work.

If you ask them to "build this full dashboard", "rework the entire app", or "implement this complete feature end-to-end", the result is usually messy. They will often do too much, miss existing patterns, break small things, or create a UI that looks good in one part but doesn't fully fit the app.

But if you go feature-by-feature or page-by-page, they can be excellent.

For example, instead of saying:

Build the entire analytics dashboard with filters, charts, empty states, settings, export, and responsive UI.

I would rather ask:

  1. create the empty dashboard layout
  2. add only the filters section
  3. create the first chart
  4. add loading and empty states
  5. make the table work
  6. polish spacing and mobile layout
  7. now review the whole page and fix rough edges

This is a different workflow. You need more patience. You need to be more involved. You need to keep the model on a tighter leash.

But the tradeoff is worth it in many cases because these models are cheap and fast enough that iterating with them still feels good.

A few days ago, I built a small internal feature for SharePDF using DeepSeek V4 Pro via Pi agent. And honestly, the UI was far better than what Codex usually gives me, and everything worked as expected as well. But I didn't ask it to rebuild the whole product in one prompt. I gave it a smaller scope, checked the output, and then continued from there.

This is why I still think models like Composer 2.5, DeepSeek, Kimi, GLM, and Qwen are very useful. In fact, I have already written that Cursor's Composer 2.5 is awesome and that Cursor's $20 plan is great value. That doesn't mean the model will behave like Claude Code or Codex in every situation.

Sometimes when you have the option, there is no point using the cheaper model. If I'm doing a complicated refactor, touching billing logic, debugging something subtle, or making a change across many files, I would still prefer the best model I have access to.

But if I'm designing a page, improving UI, adding a smaller feature, or doing something where I can review each step quickly, I don't mind using cheaper models at all. In some cases, I even prefer them. I still consider Qwen 3.7 or even 3.6 Plus models far better at design than GPT models.

So I think the correct conclusion is not "Composer is bad" or "DeepSeek is bad" or "cheap models are useless".

The correct conclusion is:

don't use a cheaper model like a SOTA model.

Use Claude Code or Codex when you want more autonomy. Use DeepSeek, Kimi, GLM, Composer, or Qwen when you are willing to drive more actively.

Both can be great. You just can't expect the same prompting style to work everywhere.

Webmentions

What’s this?