o1-pro has arrived
Kevin Weil (OpenAI CPO) claims AI will surpass humans in competitive coding this year
So…. There’s a limit for Pro plan
Sergey Brin says AGI is within reach if Googlers work 60-hour weeks
Useful diagram to consider GPT 4.5
this is what Ilya saw
With 4.5, the question is can we continue to improve creativity without extraordinary costs - that is being currently worked on
According to LiveBench, 4.5 is the best non-thinking model
LiveBench has GPT-4.5 as the best non-thinking model
o3, which powers Deep Research, is capable of successfully handling 42% of the PR contributions made by OpenAI employees
OpenAI announcement post seems to imply they might not even serve it in the API long term? $75/million input $150/million output tokens current pricing
the pricing is crazy...
Tomorrow will be interesting
anonymous-test passes the common sense test.
Information: GPT-4.5 is coming this week, but its performance on certain tasks has been mixed and worse than Claude 3.7 Sonnet.
Recent benchmark comparisons for different models on theoretical physics. Advanced models seem to easily solve undergraduate problems, while still struggle with research-level physics.
How long till we see global memory and realtime learning? Where a user can demonstrably prove/correct a mistake (like the 🍓 problem) and the model will integrate that knowledge and no longer make the same mistake with other users?
Sonnet 3.7 Extended Reasoning w/ 64k thinking tokens is the #1 model
Claude 3.7 Sonnet ranks 1st on SimpleBench.
Sonnet 3.7-thinking wins against o1 and o3 on LiveBench
Claude 3.7 Sonnet Thinking loses to o1 and o3-mini on LiveBench
3.7 sonnet LiveBench results are in
Sam has got a kid now
Grok 3 = First right-leaning near-SOTA model (this is a good thing imo)
good or bad