Reward Shaping
The design of reward signals that guide reinforcement learning without overfitting to proxy measures. Poorly shaped rewards lead to reward hacking: the agent optimizes the metric instead of solving the real task. LLMs now automate reward design (Eureka/NVIDIA): GPT-4 writes Python reward functions, runs them in simulation, and iterates based on agent performance. It is critical for robotics, game AI, and RLHF with human feedback.
In practice
A researcher training a robot to walk must balance rewards for speed, stability, and energy consumption — too much emphasis on speed produces bizarre gaits or reward hacking. With Eureka, the task is described in natural language and an LLM automatically generates the reward function, running it in Isaac Gym simulation and refining weights based on performance metrics. The same principle applies to RLHF: the language model's reward function must capture 'real utility', not just 'sounds convincing'.
Related terms
Seen in the wild
0 entries mentioning itNo archive entry mentions it explicitly. Appears in broader contexts.