← all repositories

InternScience/GraphGen

Synthetic data generation framework that creates knowledge-graph-based training data to improve supervised fine-tuning of LLMs.

GraphGen
Velocity · 7d
+2.0
★ / day
Trend
steady
star history

GraphGen is a data synthesis system designed to enhance LLM fine-tuning by generating high-quality training data from knowledge graphs. It provides knowledge-driven pipelines for creating question-answering pairs and other SFT data. The framework integrates with popular training frameworks like llama-factory and xtuner, and supports models including Qwen and LLaMA. Users can generate diverse synthetic samples through structured graph traversal and question generation, then directly apply them to fine-tune language models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.