To fill this out, I find o1-pro (and -preview when it was live) to be pretty good at filling in blindspots/spotting holistic bugs. I use Claude for day to day, and when Claude is spinning, o1 often can point out why. It's too slow for AI coding, and I agree that at default its responses aren't always satisfying.
That said, I think its code style is arguably better, more concise and has better patterns -- Claude needs a fair amount of prompting and oversight to not put out semi-shitty code in terms of structure and architecture.
In my mind: going from Slowest to Fastest, and Best Holistically to Worst, the list is:
1. o1-pro
2. Claude 3.5
3. Gemini 2 Flash
Flash is so fast, that it's tempting to use more, but it really needs to be kept to specific work on strong codebases without complex interactions.
Claude has a habit of sometimes just getting “lost”
Like I’ll have it a project in Cursor and it will spin up ready to use components that use my site style, reference existing components, and follow all existing patterns
Then on some days, it will even forget what language the project is in and start giving me python code for a react project
I find myself hoping between o1 and Sonnet pretty frequently these days, and my personal observation is that the quality of output from o1 scales more directly to the quality of the prompting you're giving it.
In a way it almost feels like it's become too good at following instructions and simply just takes your direction more literally. It doesn't seem to take the initiative of going the extra mile of filling in the blanks from your lazy input (note: many would see this as a good thing). Claude on the other hand feels more intuitive in discerning intent from a lazy prompt, which I may be prone to offering it at times when I'm simply trying out ideas.
However, if I take the time to write up a well thought out prompt detailing my expectations, I find I much prefer the code o1 creates. It's smarter in its approach, offers clever ideas I wouldn't have thought of, and generally cleaner.
Or put another way, I can give Sonnet a lazy or detailed prompt and get a good result, while o1 will give me an excellent result with a well thought out prompt.
What this boils down to is I find myself using Sonnet while brainstorming ideas, or when I simply don't know how I want to approach a problem. I can pitch it a feature idea the same way a product owner might pitch an idea to an engineer, and then iterate through sensible and intuitive ways of looking at the problem. Once I get a handle on how I'd like to implement a solution, I type up a spec and hand it off to o1 to crank out the code I'd intend to implement.
Have you found any tool or guide for writing better o1 prompts? This isn’t the first time I’ve heard this about o1 but no one seems to know how to prompt it
I just asked o1 a simple yes or no question about x86 atomics and it did one of those A or B replies. The first answer was yes, the second answer was no.
o1 is when all else fails, sometimes it does the same mistakes as weaker models if you give it simple tasks with very little context, but when a good precise context is given it usually outperforms other
Models
I've found gemini-1206 to be best. and we can use it free (for now), in google's aistudio. It's number 1 on lmarena.ai for coding, and generally, and number 1 on bigcodebench.
Yeah I feel for chat use case, o1 is just too slow for me, and my queries aren’t that complicated.
For coding, o1 is marvelous at Leetcode question I think it is the best teacher I would ever afford to teach me leetcoding, but I don’t find myself have a lot of other use cases for o1 that is complex and requires really long reasoning chain
Sonnet 3.5 remains the king of the hill by quite some margin