← all repositories
CAMeL-Lab/camel_tools

Arabic NLP that doesn't treat the language as an afterthought

A research-backed Python toolkit for morphological analysis, dialect identification, and more—built by linguists who actually know Arabic.

554 stars Python Other AI
camel_tools
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

CAMeL Tools is a Python library for Arabic NLP developed at NYU Abu Dhabi’s CAMeL Lab. It covers the full pipeline from preprocessing through morphological analysis, disambiguation, and generation, plus higher-level tasks like dialect identification, named entity recognition, and sentiment analysis. Data sets are managed separately via a camel_data CLI and parked under ~/.camel_tools by default.

The interesting bit

Arabic’s morphological complexity—roots, patterns, clitics, and rampant ambiguity—makes off-the-shelf NLP libraries stumble. This toolkit was built by computational linguists who publish on the topic, not by a product team bolting Arabic support onto a multilingual framework. The Rust compiler dependency hints at performance-critical components under the hood.

Key highlights

  • Morphological analysis, disambiguation, generation, and reinflection (the full paradigm)
  • Dialect identification across Arabic varieties
  • Named entity recognition and sentiment analysis
  • Command-line tools plus a Python API
  • Guided tour notebook and ReadTheDocs documentation
  • MIT licensed, with an established academic citation

Caveats

  • Requires Python 3.10–3.14, plus CMake, Boost, and Rust; not a casual pip install on all systems
  • Dialect identification is unavailable on Windows
  • Apple Silicon needs an architecture flag (CMAKE_OSX_ARCHITECTURES=arm64)

Verdict

Worth a look if you’re doing serious Arabic text processing and need linguistically informed tools rather than generic multilingual models. Skip it if you just need quick translation or basic Arabic tokenization—there are lighter options for that.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.