Search Skills

Search for skills or navigate to categories

Skillforthat
AI & Machine Learning
awq-quantization

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accura...

Category

AI & Machine Learning

Developer

davila7
davila7

Updated

Jan
2026

Tags

1
Total

Description

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

Skill File

SKILL.md
1Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

Tags

Ml

Information

Developerdavila7
CategoryAI & Machine Learning
CreatedJan 15, 2026
UpdatedJan 15, 2026

You Might Also Like