I also feel the frustration of the llm reverse-compression - when a whole article is generated from a single sentence. But when I post something edited by AI it is usually a result of a long back and forth of editing and revising. I guess I could post the whole conversation thread - but it would be very long.
Personally I would just like to read the best comments.
I think these rules should have a pre-determined shelf life. They are not bad at the current state of the world - they push in the right direction - but they complicated law, and I bet there will be many second-level outcomes that are hard to predict now. Besides that - once the capabilities for reuse are built - they should be sustainable - so the second level outcomes will actually dominate.
The instructions are standard documents - but this is not all. What the system adds is an index of all skills, built from their descriptions, that is passed to the llm in each conversation. The idea is to let the llm read the skill when it is needed and not load it into context upfront. Humans use indexes too - but not in this way. But there are some analogies with GUIs and how they enhance discoverability of features for humans.
I wish they arranged it around READMEs. I have a directory with my tasks and I have a README.md there - before codex had skills it already understood that it needs to read the readme when it was dealing with tasks. The skills system is less directory dependent so is a bit more universal - but I am not sure if this is really needed.
Hmm - maybe I should not call it index - people lookup stuff in the index when needed. Here the whole index is inserted in the conversation - it is as if when starting a task human read the whole table of contents of the manual for that task.
Claude reads from .claude/instructions.md whenever you make a new convo as a default thing. I usually have Claude add things like project layout info and summaries, preferred tooling to use, etc. So there's a reasonable expectation of how it should run. If it starts 'forgetting' I tell it to re-read it.
No, Claude Code reads the CLAUDE.md in the root of your project. It's case sensitive so it has to be exactly that, too. Github Copilot reads from .github/copilot-instructions.md and supposedly AGENTS.md. Anigravity reads AGENTS.md and pulls subagents and the like from a .agents directory. This is probably why you have to remind it to re-read it so much, the harness isn't loading it for you.
> What the system adds is an index of all skills, built from their descriptions, that is passed to the llm in each conversation. The idea is to let the llm read the skill when it is needed and not load it into context upfront.
This is different from swagger / OpenAPI how?
I get cross trained web front-end devs set a new low bar for professional amnesia and not-invented-here-ism, but maybe we could not do that yet another time?
> Why not just extend the OpenAPI specification to skills?
Because approximately none of what exists in the existing OpenAPI specification is relevant to the task, and nothing needed for the tasks is relevant to the current OpenAPI use case, so trying to jam one use case into a tool designed for the other would be pure nonsense.
It’s like needing to drive nails and asking why grab a hammer when you already have a screwdriver.
Reasoning is recursive - you cannot isolate where is should be symbolic and where it should be llm based (fuzzy/neural). This is the idea that started https://github.com/zby/llm-do - there is also RLM: https://alexzhang13.github.io/blog/2025/rlm/ RLM is simpler - but my approach also have some advantages.
I think the AI community is sleeping hard on proper symbolic recursion. The computer has gigabytes of very accurate "context" available if you start stacking frames. Any strategy that happens inside token space will never scale the same way.
Depth first, slow turtle recursion is likely the best way to reason through the hardest problems. It's also much more efficient compared to things that look more like breadth first search (gas town).
I only agree with that statement if you're drawing from the set of all possible problems a priori. For any individual domain I think it's likely you can bound your analytic. This ties into the no free lunch theorem.
Pi has probably the best architecture and being written in Javascript it is well positioned to use the browser sandbox architecture that I think is the future for ai agents.
I would double - do skills reliably work for you? I mean are they reliably injected when there is a need, as opposed to being actively called for (which in my opinion defeats the purpose of skills - because I can always ask the llm to read a document and then do something with the new knowledge).
I have a feeling that codex still does not do it reliably - so I still have normal README files which it loads quite intelligently and it works better than the discovery via skills.
reply