PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
[view on github]last commit: Nov 23, 2025
stars
1,603
7d
+2
30d
+4
90d
+20
## star history
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback