← all repositories
malaysia-ai/malaya

NLP for a language the big toolkits forgot

Malaya gives Malaysian developers first-class PyTorch models for tasks that NLTK and spaCy barely touch.

525 stars Jupyter Notebook ML FrameworksLanguage Models
malaya
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Malaya is a PyTorch-based NLP toolkit purpose-built for bahasa Malaysia. It covers the standard suspects—sentiment analysis, named entity recognition, POS tagging, language detection, text normalization—but tuned for Malaysian Malay rather than borrowed from English models and hoping for the best.

The interesting bit

The project is old enough to have TensorFlow in its topic tags and young enough to have switched to PyTorch, which suggests actual maintenance rather than abandonware. Pretrained models live on HuggingFace under the mesolitica org, so you’re not stuck training from scratch on a low-resource language.

Key highlights

  • Supports Python 3.6+ and PyTorch 1.10+; leaves PyTorch installation to you so you pick CPU or GPU
  • Models hosted at huggingface.co/mesolitica
  • Windows users get dedicated docs (always a tell that someone has suffered)
  • Research-backed: includes a BibTeX citation and acknowledges TFRC TPU access, suggesting serious training runs
  • Active enough to have a Discord community

Caveats

  • The README is thin on specifics: no model sizes, no benchmark numbers, no latency claims
  • Jupyter Notebook as the repo language suggests heavy docs/examples; the actual library structure is unclear from the README alone
  • “Entity framework” in the GitHub topics appears to be a tag misfire, not an ORM

Verdict

Worth a look if you’re building Malay-language products and tired of forcing multilingual models to cope with local slang and syntax. Skip it if your use case is English-dominant; you already have better-supported options.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.