← all repositories

lefterisloukas/edgar-crawler

Open-source toolkit that downloads SEC EDGAR financial filings and parses them into structured JSON for downstream NLP analysis.

524 stars Python Data Tooling
edgar-crawler
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

The tool retrieves raw financial filings from the SEC EDGAR system and extracts specific item sections (10-K, 10-Q, 8-K) into standardized JSON format. It is designed to serve as data infrastructure for financial NLP research, and has produced the EDGAR-CORPUS HuggingFace dataset. The project was presented at WWW 2025 and multiple NLP workshops.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.