jankais3r/LLaMA_MPS
Enables LLaMA and Stanford-Alpaca model inference on Apple Silicon via MPS (Metal Performance Shaders) backend.

This repository provides a Python-based setup for running LLaMA and Stanford-Alpaca language model inference on Apple Silicon GPUs. It leverages PyTorch’s MPS backend to utilize Metal for accelerated computation. The project includes scripts for model weight download, optional resharding for larger models (13B/30B/65B), and conversion for the Alpaca fine-tuned variant. Users can run inference in auto-complete or instruction-response (ChatGPT-like) modes.