← all repositories

airaria/Visual-Chinese-LLaMA-Alpaca

A multimodal Chinese LLaMA model extended with visual encoding to process and understand image inputs alongside text.

Visual-Chinese-LLaMA-Alpaca
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

VisualCLA extends the Chinese LLaMA/Alpaca foundation model with image encoding modules, enabling it to process visual information. It uses Chinese image-text pairs for multimodal pretraining to align visual and textual representations, followed by instruction tuning on multimodal datasets to improve instruction following and conversational abilities. The project provides inference code and deployment scripts via Gradio and Text-Generation-WebUI.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.