Opus 4.6 has been awful for me and my team. It goes immediately off the rails and jumps to conclusions on wants and asks and just keeps chugging along forever and won't let anything stop it down whatever path it decides. 4.5 was awesome and is our still go-to model.
That's interesting, 4.6 is finally when AI started to become good in my eyes. I have a very strict plan phase, argue, plan then partial execute. I like it to do boilerplate then I do the hard stuff myself and have it do a once over at the end.
Although I have had it try to debug something and just get stuck chugging tokens.
I have found this to be true too and I thought I was the only one. Everyone is praising 4.6 and while it’s great at agentic and tool use, it does not follow instructions as cleanly as 4.5 - I also feel like 4.5 was just way more efficient too
I think that's because not everyone does the same job within the same stack and constraints. I'm yet to find an LLM that writes the kind of C++ I dabble with without having to manually tweak it myself (or that truly understands our codebase). Conversely, I find that LLMs are now excellent at python and orchestration tasks for instance. It's very situational