In practice
Useful for tasks with clear feedback (failing tests, wrong answers). The agent learns from its mistakes within the same session, without fine-tuning. Often boosts success on coding and reasoning benchmarks.
Useful for tasks with clear feedback (failing tests, wrong answers). The agent learns from its mistakes within the same session, without fine-tuning. Often boosts success on coding and reasoning benchmarks.