The RL-LLM Taxonomy Tree- Reviewing Synergies Between Reinforcement Learning and Large Language Models

RL4LLM-Fine-tuning: eg RL tweaks model’s parameters
RL4LLM-Prompt Engineering: RL iteratively update the prompt
LLM4RL-Reward Shaping: LLM design the reward function of the RL agent
LLM4RL-Goal Generation: LLM is utilized for goal setting, applying to goal-conditioned RL
LLM4RL-Policy: LLM represents the policy function of the RL agent
RL+LLM-No Language Feedback
RL+LLM-Language Feedback
LLM’s training is already involved with RL in the form of RLHF. This paper is concerned with RL with already trained LLMs.

Links

tags	#resource, #modern_ai, #paper/overview
date	2025-08-04