Parsing anything from a potential adversary needs to account for failure. Unbounded recursion is secure (ie fails safely) if the compiler is working properly.
As to DoS, without looking at the code I'm unclear why various approaches to bounding resource consumption wouldn't have worked. I assume something specific to this library and how it is used must have prevented the obvious approaches. Still, not an issue in the general case.
Guarding against unbounded recursion requires both compiler support and runtime environment support: you have to use enough resources to handle legitimate queries, but small enough memory constraints that a "query of death" doesn't kill nodes that are expensive to reactivate. Even then, by their very nature queries-of-death are usually hard to detect and a much simpler solution is something you can do in the static space, such as put an arbitrary hard-bound on recursion depth far below your resource constraints so you can fail the query without losing the whole processing node.
Google protobuffers have buried deep within at least their C++ parser an arbitrary hard limit for nesting depth (I think it may be 32). It's annoying when you hit it, but it's there for a reason.
> Guarding against unbounded recursion requires both compiler support and runtime environment support
I feel like this is similar in spirit to saying "guarding against infinite loops requires both ...".
Where resource consumption is concerned, as you pointed out you can track that manually. Presumably you have to do that anyway, since the iterative case will also need to give up and fail the task at some point.
I really don't see where recursion itself introduces an issue here. I guess if you expect to pass through a great many nodes that don't otherwise trigger resource allocation except for the stack frame, and the compiler can't optimize the activation records away, it could be problematic. That's pretty specific though. Is that really the case for a typical parser?
The stack frame also holds local variables. It's not just a return address. If your function requires 3 local variables then each call requires 3 stack slots.