AI & Machine Learning

llama-cpp

llama-cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware

Category

AI & Machine Learning

Developer

Updated

Jan

2026

Tags

1

Total

Description

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Skill File

SKILL.md

1Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Tags

Ai

Information

Developerdavila7

CategoryAI & Machine Learning

CreatedJan 15, 2026

UpdatedJan 15, 2026

View Source Documentation

You Might Also Like

add-uint-support

Add Uint Support

Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros

docstring

Docstring

Write docstrings for PyTorch functions and methods following PyTorch conventions

skill-creator

Skill Creator

Guide for creating effective skills

claude-opus-4-5-migration

Claude Opus 4 5 Migration

Migrate prompts and code from Claude Sonnet 4

agent-identifier

Agent Identifier

This skill should be used when the user asks to "create an agent", "add an agent", "write a subag...

command-development

Command Development

This skill should be used when the user asks to "create a slash command", "add a command", "write...