awq-quantization
Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accura...
Description
Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.
Skill File
Tags
Information
You Might Also Like
Add Uint Support
Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros
Docstring
Write docstrings for PyTorch functions and methods following PyTorch conventions
Skill Creator
Guide for creating effective skills
Claude Opus 4 5 Migration
Migrate prompts and code from Claude Sonnet 4
Agent Identifier
This skill should be used when the user asks to "create an agent", "add an agent", "write a subag...
Command Development
This skill should be used when the user asks to "create a slash command", "add a command", "write...