AI & Machine Learning

simpo-training

simpo-training

Simple Preference Optimization for LLM alignment

Category

AI & Machine Learning

Developer

Updated

Jan

2026

Tags

1

Total

Description

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Skill File

SKILL.md

1Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Tags

Ai

Information

Developerdavila7

CategoryAI & Machine Learning

CreatedJan 15, 2026

UpdatedJan 15, 2026

View Source Documentation

You Might Also Like

add-uint-support

Add Uint Support

Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros

docstring

Docstring

Write docstrings for PyTorch functions and methods following PyTorch conventions

skill-creator

Skill Creator

Guide for creating effective skills

claude-opus-4-5-migration

Claude Opus 4 5 Migration

Migrate prompts and code from Claude Sonnet 4

agent-identifier

Agent Identifier

This skill should be used when the user asks to "create an agent", "add an agent", "write a subag...

command-development

Command Development

This skill should be used when the user asks to "create a slash command", "add a command", "write...