RLAIF Research
PG Research Article for Statistical Natural Language Processing Module at UCL
Explored the Reinforcement Learning with AI/Human Feedback (RLAIF/RLHF) paradigm, examining its application to align large language models with human preferences. The study compared various RLHF methods and implemented PPO (Proximal Policy Optimization) from scratch to better understand the complexities of human preference learning.