PKU-Alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

[view on github]last commit: Nov 23, 2025
stars
1,593
7d
-
30d
-
90d
-
## star history
## found in