I battled DeepSeek V3 (0324) and Claude 3.7 Sonnet in a 250k Token Codebase...

I used Aider to test the coding skills of the new DeepSeek V3 (0324) vs Claude 3.7 Sonnet and boy did DeepSeek deliver. I tested their tool using Cline MCP servers (Brave Search and Puppeteer), their frontend bug fixing skills using Aider on a Vite + React Fullstack app. Some TLDR findings:

- They rank the same in tool use, which is a huge improvement from the previous DeepSeek V3

- DeepSeek holds its ground very well against 3.7 Sonnet in almost all coding tasks, backend and frontend

- To watch them in action: https://youtu.be/MuvGAD6AyKE

- DeepSeek still degrades a lot in inference speed once its context increases

- 3.7 Sonnet feels weaker than 3.5 in many larger codebase edits

- You need to actively manage context (Aider is best for this) using /add and /tokens in order to take advantage of DeepSeek. Not for cost of course, but for speed because it's slower with more context

- Aider's new /context feature was released after the video, would love to see how efficient and Agentic it is vs Cline/RooCode

What are your impressions of DeepSeek? I'm about to test it against the new king Gemini 2.5 Pro (Exp) and will release a comparison video later