I don't think the evidence bears that out [0]. I agree that GPT-4 is way better than GPT-3.5, but I don't think most of the OSS are even close to GPT-3.5. Vicuna is closer for simple tasks/conversation, but it still doesn't match GPT-3.5 elsewhere IMO, even though GPT-3.5 is also not great at complex tasks.
This is fair; my only evidence is my personal experience, most of which has consisted of trying models on Huggingface or similar, which isn't persuasive.
I will say based on my cursory glance that a lot of the tasks here seem odd for a chat AI, though there are certainly applications that might use them. E.g. asking if someone insulted another person given some transcript of their conversation seems like a somewhat tough sell to me. Nevertheless, ChatGPT performed better. Was that because ChatGPT has a better training set, a better architecture or more examples in it training set related to the questions? Is that even knowable?