kreuzberg-dev/html-to-markdown
High-performance HTML to Markdown converter with multi-language bindings, part of a document intelligence platform that includes OCR.

This library converts HTML to Markdown format following CommonMark specifications. It is maintained as part of the Kreuzberg project, a polyglot document intelligence engine with a Rust core. The engine can extract structured data from over 56 document formats using streaming parsers and built-in optical character recognition. The tool is available across multiple programming language ecosystems (Rust, Python, Node.js, Java, Go, C#, PHP, Ruby) and is tagged for use in RAG pipelines.