← all repositories
logicalclocks/hopsworks

An open-source AI lakehouse that actually admits it's heavy

Hopsworks bundles a feature store, MLOps pipeline tooling, and team governance into one Java-heavy platform you can run anywhere—or pay them to manage.

hopsworks
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Hopsworks is a self-described “Real-Time AI Lakehouse” built around a Python-centric feature store. It gives ML teams a shared workspace for feature engineering, model registry, training pipelines, and model serving, with project-based multi-tenancy so different teams can safely share a single cluster. The platform wraps in Jupyter notebooks, Airflow for pipeline orchestration, and support for Spark, Flink, and GPU training.

The interesting bit

The project-based sandbox model is the unusual angle: it treats ML assets (features, models, training data) as governed, versioned resources that can be shared across team boundaries without dumping everyone into the same namespace. That’s the governance pitch—sensitive data stays isolated, collaboration happens anyway.

Key highlights

  • Modular by design: usable as standalone feature store, full MLOps platform, or anything between
  • Multi-platform: managed cloud (AWS/Azure/GCP), on-prem Linux installs, or serverless app at app.hopsworks.ai
  • AGPL-V3 licensed — copyleft, so modifications must be shared back
  • Integrates with Databricks, SageMaker, and KubeFlow per the README
  • On-prem requires 32GB RAM, 8 CPUs, and direct engagement with Hopsworks engineering for setup

Caveats

  • On-premise installation is explicitly not self-serve: “each infrastructure is unique and requires a tailored approach” — expect professional services
  • The serverless app is labeled beta
  • Java repo with 1,299 stars; the actual Python APIs live in separate repositories

Verdict

Worth evaluating if your team has outgrown ad-hoc feature storage and needs governed collaboration across multiple projects. Skip it if you want a lightweight, drop-in feature store without the full-platform commitment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.