← all repositories
byzer-org/byzer-lang

SQL that swallows REST APIs and spits out Delta tables

Byzer-lang treats everything as a table, including JSON endpoints you'd rather not parse by hand.

1.8k stars Scala Data ToolingML Frameworks
byzer-lang
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does Byzer-lang is a SQL-like DSL that wraps Spark and other compute frameworks so you never touch them directly. You write declarative pipelines to load data (from REST APIs, files, or elsewhere), transform it, run built-in ML algorithms, and save to Delta Lake — all in one dialect. The project also ships a notebook UI and VS Code extension for good measure.

The interesting bit The “everything is a table” philosophy gets pushed further than usual: the README shows a GitHub API call ingested via LOAD Rest.\``, decoded from binary, JSON-expanded, and saved to Delta without leaving SQL syntax. It’s not just querying tables; it’s tabularizing the untabular.

Key highlights

  • Distributed execution backed by Spark (3.0.0 profile visible in build config), though you interact only with Byzer’s engine
  • Built-in extensions for JSON expansion, ML algorithms, and other data/AI tasks
  • Multiple entry points: local all-in-one package, Hadoop deployment, Docker, VS Code extension, or online trial
  • Active BIP (Byzer Improvement Proposal) process for community-driven design
  • Companion project Byzer Notebook for GUI workflow editing

Caveats

  • Documentation links in the README point heavily to Chinese-language pages; English coverage is unclear
  • Development setup requires specific IntelliJ Maven profiles (scala-2.12, spark-3.0.0, etc.) — not the smoothest on-ramp
  • The “low-code” claim is fair for pipeline logic, but you’ll still need to understand Spark deployment for production

Verdict Worth a look if your team lives in SQL but keeps hitting friction with Python/Scala glue code for ETL and light ML. Skip it if you need deep framework control or a mature, English-first documentation ecosystem.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.