bazingagin/npc_gzip
A parameter-free text classification method that uses data compressors (gzip, bz2, lzma) to classify text without any model training.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
This repository implements a text classification approach that leverages compression algorithms to measure the similarity between text documents for classification. The method works by compressing concatenated text pairs and comparing compressed sizes as a distance metric, eliminating the need for trained ML parameters. It supports multiple datasets (AG_NEWS, DBpedia, YahooAnswers, etc.) and compressor options (gzip, lzma, bz2).