Just as an aside, I've personally found o1 to be completely useless for coding. ...

vessenes · on Dec 20, 2024

To fill this out, I find o1-pro (and -preview when it was live) to be pretty good at filling in blindspots/spotting holistic bugs. I use Claude for day to day, and when Claude is spinning, o1 often can point out why. It's too slow for AI coding, and I agree that at default its responses aren't always satisfying.

That said, I think its code style is arguably better, more concise and has better patterns -- Claude needs a fair amount of prompting and oversight to not put out semi-shitty code in terms of structure and architecture.

In my mind: going from Slowest to Fastest, and Best Holistically to Worst, the list is:

1. o1-pro 2. Claude 3.5 3. Gemini 2 Flash

Flash is so fast, that it's tempting to use more, but it really needs to be kept to specific work on strong codebases without complex interactions.

spaceman_2020 · on Dec 21, 2024

Claude has a habit of sometimes just getting “lost”

Like I’ll have it a project in Cursor and it will spin up ready to use components that use my site style, reference existing components, and follow all existing patterns

Then on some days, it will even forget what language the project is in and start giving me python code for a react project

causal · on Dec 21, 2024

Yeah it's almost like system 1 vs system 2 thinking

famouswaffles · on Dec 20, 2024

To be fair, until the last checkpoint released 2 days ago, o1 didn't really beat sonnet (and if so, barely) in most non-competitive coding benchmarks

bitbuilder · on Dec 20, 2024

I find myself hoping between o1 and Sonnet pretty frequently these days, and my personal observation is that the quality of output from o1 scales more directly to the quality of the prompting you're giving it.

In a way it almost feels like it's become too good at following instructions and simply just takes your direction more literally. It doesn't seem to take the initiative of going the extra mile of filling in the blanks from your lazy input (note: many would see this as a good thing). Claude on the other hand feels more intuitive in discerning intent from a lazy prompt, which I may be prone to offering it at times when I'm simply trying out ideas.

However, if I take the time to write up a well thought out prompt detailing my expectations, I find I much prefer the code o1 creates. It's smarter in its approach, offers clever ideas I wouldn't have thought of, and generally cleaner.

Or put another way, I can give Sonnet a lazy or detailed prompt and get a good result, while o1 will give me an excellent result with a well thought out prompt.

What this boils down to is I find myself using Sonnet while brainstorming ideas, or when I simply don't know how I want to approach a problem. I can pitch it a feature idea the same way a product owner might pitch an idea to an engineer, and then iterate through sensible and intuitive ways of looking at the problem. Once I get a handle on how I'd like to implement a solution, I type up a spec and hand it off to o1 to crank out the code I'd intend to implement.

spaceman_2020 · on Dec 21, 2024

Have you found any tool or guide for writing better o1 prompts? This isn’t the first time I’ve heard this about o1 but no one seems to know how to prompt it

jules · on Dec 20, 2024

Can you solve this by putting your lazy prompt through GPT-4o or Sonnet 3.6 and asking it to expand the prompt to a full prompt for o1?

InkCanon · on Dec 20, 2024

I just asked o1 a simple yes or no question about x86 atomics and it did one of those A or B replies. The first answer was yes, the second answer was no.

bearjaws · on Dec 20, 2024

o1 is pretty good at spotting OWASP defects, compared to most other models.

https://myswamp.substack.com/p/benchmarking-llms-against-com...

m3kw9 · on Dec 20, 2024

o1 is when all else fails, sometimes it does the same mistakes as weaker models if you give it simple tasks with very little context, but when a good precise context is given it usually outperforms other Models

cchance · on Dec 20, 2024

The new gemini's are pretty good too

spaceman_2020 · on Dec 21, 2024

The new ai studio from Google is fantastic

lysecret · on Dec 20, 2024

Actually prefer new geminis too. 2.0 experimental especially.

leumon · on Dec 20, 2024

I've found gemini-1206 to be best. and we can use it free (for now), in google's aistudio. It's number 1 on lmarena.ai for coding, and generally, and number 1 on bigcodebench.

energy123 · on Dec 21, 2024

Which o1? A new version was released a few days ago and beats Sonnet 3.5 on Livebench

karmasimida · on Dec 20, 2024

Yeah I feel for chat use case, o1 is just too slow for me, and my queries aren’t that complicated.

For coding, o1 is marvelous at Leetcode question I think it is the best teacher I would ever afford to teach me leetcoding, but I don’t find myself have a lot of other use cases for o1 that is complex and requires really long reasoning chain