I tested the new DeepSeek V3 (0324) vs Claude 3.7 Sonnet in a 250k Token Codebase...

I used Aider to test the coding skills of the new DeepSeek V3 (0324) vs Claude 3.7 Sonnet and boy did DeepSeek deliver. DeepSeek V3 is now in an MIT license and as always, is open weights. GOAT. I tested their Tool Use abilities, using Cline MCP servers (Brave Search and Puppeteer), their frontend bug fixing skills using Aider on a Vite + React Fullstack app. Some TLDR findings:

- They rank the same in tool use, which is a huge improvement from the previous DeepSeek V3

- DeepSeek holds its ground very well against 3.7 Sonnet in almost all coding tasks, backend and frontend

- To watch them in action: https://youtu.be/MuvGAD6AyKE

- DeepSeek still degrades a lot in inference speed once its context increases

- 3.7 Sonnet feels weaker than 3.5 in many larger codebase edits

- You need to actively manage context (Aider is best for this) using /add and /tokens in order to take advantage of DeepSeek. Not for cost of course, but for speed because it's slower with more context

- Aider's new /context feature was released after the video, would love to see how efficient and Agentic it is vs Cline/RooCode

- If you blacklist slow providers in OpenRouter, you actually get decent speeds with DeepSeek

What are your impressions of DeepSeek? I'm about to test it against the new proclaimed king, Gemini 2.5 Pro (Exp) and will release findings later