Search Skills

Search for skills or navigate to categories

Skillforthat
AI & Machine Learning
simpo-training

simpo-training

Simple Preference Optimization for LLM alignment

Category

AI & Machine Learning

Developer

davila7
davila7

Updated

Jan
2026

Tags

1
Total

Description

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Skill File

SKILL.md
1Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Tags

Ai

Information

Developerdavila7
CategoryAI & Machine Learning
CreatedJan 15, 2026
UpdatedJan 15, 2026

You Might Also Like